Abstract
This study provides an extensive review of over 200 journal papers focusing on Machine Learning (ML) algorithms’ use for promoting a sustainable management of the marine and coastal environments. The research covers various facets of ML algorithms, including data preprocessing and handling, modeling algorithms for distinct phenomena, model evaluation, and use of dynamic and integrated models. Given that machine learning modeling relies on experience or trial-and-error, examining previous applications in marine and coastal modeling is proven to be beneficial. The performance of different ML methods used to predict wave heights was analyzed to ascertain which method was superior with various datasets. The analysis of these papers revealed that properly developed ML methods could successfully be applied to multiple aspects. Areas of application include data collection and analysis, pollutant and sediment transport, image processing and deep learning, and identification of potential regions for aquaculture and wave energy activities. Additionally, ML methods aid in structural design and optimization and in the prediction and classification of oceanographic parameters. However, despite their potential advantages, dynamic and integrated ML models remain underutilized in marine projects. This research provides insights into ML’s application and invites future investigations to exploit ML’s untapped potential in marine and coastal sustainability.
1 Introduction
Coastal areas are of vital significance due to their crucial role in supporting aspects such as biodiversity, economic activity, cultural heritage, climate regulation, food security, recreational opportunities, and strategic importance (Neumann et al., 2017). Ensuring their sustainability, however, is a challenge that requires addressing various factors, among which, climate change adaptation, beach protection and water quality management. One approach to ensuring the sustainability of coastal areas involves conducting a thorough examination of each contributing factor by employing data analysis and suitable methods. Effective data analysis and augmentation is, therefore, essential for informed decision-making and sustainable management of coastal areas.
The amount of data related to coastal systems has dramatically increased recently (Goldstein et al., 2019). This data, which often covers large areas and spans long periods of time, is now available in high resolution and can be accessed quickly. This has led to more opportunities for research on the sustainability of activities evolving in coastal areas. However, handling large and complex datasets, as well as identifying their patterns and trends, is not a convenient task. Despite their widespread use and mathematical rigor, conventional statistical techniques, including descriptive statistics (Emmanouil et al., 2020), inferential statistics (Agarwal and Manuel, 2008), regression analysis (Davidson et al., 1996; Hall et al., 2002), correlation analysis (Szmytkiewicz et al., 2000; Kroon et al., 2008; Ruiz de Alegría-Arzaburu et al., 2010), Analysis of Variance (ANOVA) (Martins et al., 2010), and Principal Component Analysis (PCA) (Hua et al., 2007; Miller and Dean, 2007), have limitations when processing large and complex data sets, and can present challenges in terms of interpretability. This has prompted researchers to explore alternative, more sophisticated approaches such as ML, which enables researchers to draw insights from data in a more efficient, accurate and automated way.
ML is a rapidly growing field that has the potential to make significant contributions to the sustainable use and management of marine and coastal environments. This by helping to better understand and predict the impacts of human activities and natural phenomena on coastal ecosystems and identify potential threats. The link between using machine learning to simulate coastal and marine events and sustainability revolves around creating models and taking action. Machine learning employs large amounts of data to create simulations for different scenarios, such as wave propagation or water quality management. These simulations help us fine-tune our actions, like improving wave energy converters or changing shipping paths to avoid pollution. Moreover, these simulations can guide our work towards adapting and mitigating the effects of environmental changes, like coastal erosion caused by rising sea levels. In essence, machine learning offers crucial insights that contribute to improved, sustainable care of our coastal and marine environments.
Typically, the primary input of ML algorithms consists of a data set in various forms such as numeric, image, DEMs collected by Lidar (light detection and ranging), video, and geographic information systems (GIS) data, which are mapped and visualized using GIS. The main output of ML algorithms in coastal engineering can vary depending on the specific application and dataset being used, which includes prediction [e.g., coastal flooding risk (Park and Lee, 2020), storm surge (Sajjad et al., 2020), wave height (Dogan et al., 2021), sediment transport (Pourzangbar et al., 2017b; 2017c; 2017a) and beach erosion (Beuzen et al., 2019)], image processing using satellite imagery data (Agrafiotis et al., 2019) or drone footage (Provost et al., 2020), pattern recognition [e.g., patterns of sediment transport (Liu et al., 2021)], placement optimisation (Cuadra et al., 2016; Sarkar et al., 2016; Neshat et al., 2019), optimization by identifying the most efficient and cost-effective solutions for protecting the coast from erosion and flooding, monitoring (e.g., using sensor data to detect erosion or changes in water quality), anomaly detection (e.g., unusual changes in water quality), and decision making (Lazuardi et al., 2021) by providing decision support to coastal managers and engineers. However, the applicability of ML approaches in coastal engineering is influenced by various factors such as data quality, computational resources, the complexity of the coastal system, and the choice of appropriate algorithms.
Several ML methods have been used to study the sustainable use of coastal areas, including: Artificial Neural Networks (ANNs) used for predictions such as water quality (Chen and Ma, 2010), river classification based on the water quality index (Wong et al., 2021), wave height (Rao and Mandal, 2005; Günaydin, 2008) and beach erosion (Hashemi et al., 2010) and tidal prediction; Decision Trees (DTs) used for classifying the dominating environmental factors; Random Forests (RFs) used for regression and classification tasks, such as predicting the effect of human activities on the coastal environment and water quality index modelling (Sakaa et al., 2022); Support Vector Machines (SVMs) used for solving classification and regression problems, such as identifying the most vulnerable areas in coastal zones; K-Nearest Neighbors (KNN) used for clustering and classification tasks, such as grouping coastal regions based on their sustainability indicators; Ensemble Methods used for improving the accuracy of predictions and classifications, such as predicting the impact of climate change on sustainability of coastal activities, among others.
ML has been widely used in numerous research studies, but there still exists a knowledge gap regarding the selection of parameters, choice of predictive models (be they dynamic or static), domain adaptation, and use of integrated models for analyzing complex systems and evaluating the effects of multiple factors. In relation to data treatment, many existing works have relied on simple heuristic methods or rules of thumb; however, there are more solid mathematical and metaheuristic methods for data preprocessing and parameter identification, highlighted in this paper. Choosing the correct model can be challenging and there is not a definitive method to identify the most suitable ML model for a given problem. In general, the ML approach used to solve a specific issue is selected through a process of trial and error. However, comparing how models perform under different conditions can aid in selecting the most suitable one for a specific issue. To the best of authors’ knowledge, there is not one single paper that offers comprehensive information about the data preprocessing and preparation phase. This paper provides an extensive review of various methodologies employed in coastal engineering to handle datasets. The main focus of this paper is to understand how ML models contribute to the sustainable use and management of marine and coastal environments, rather than the technical intricacies of their setup. The primary goal is to provide a critical review of literature that utilized ML approaches to manage marine phenomena. This review sheds some light on how to prepare parameters and datasets for input into the ML model, the pros and cons of various models, the suitability of ML methods for certain conditions, and their shortcomings and deficiencies.
Although numerous papers have discussed modeling coastal phenomena using experimental, numerical, and mathematical methodologies, the focus of the current paper is exclusively on literature that implemented ML techniques for modeling coastal and marine events. The selected literature spans a broad range of topics from data preprocessing and parameter considerations to different kinds of ML models used for various purposes. Due to the large amount of published papers, the focus of our contribution was directed towards resources published in reputable international journals such as Elsevier, Springer, IWA, Taylor and Francis, Wiley, ASCE, among others. The papers were chosen based on their publication in reputable international journals and were retrieved through online searches using relevant keywords. Among the publications, Coastal Engineering (Elsevier) with 18 papers and Ocean Engineering (Elsevier) with 17 papers, had the most papers in this area. The majority of the sources are fairly recent, predominantly within the past 10 years. Nevertheless, this paper includes some older references that established the groundwork for newer methods. Roughly, fewer than 5% of the literature we reviewed was published before 2000, about 14% between 2000 and 2010, 22% between 2010 and 2015, and over 60% in the last 10 years.
While ML has been implemented in numerous studies, knowledge gaps exist in areas such as parameter selection, choice of models for making predictions (dynamic or static), domain adaptation, and the use of integrated models for modeling complex systems. The emphasis of the paper is on the contribution of ML models to the sustainable use and management of the marine and coastal environment, rather than on the technical details of their configuration.
The paper is structured as follows: Section 2 discusses the key components of data analysis and preprocessing, including data collection and preparation for the modeling process. Section 3 focuses on studies that have applied AI to coastal engineering for sustainable outcomes. The paper also evaluates the accuracy and robustness of the different models in Section 4. Finally, the paper summarizes all the information presented and concludes with a list of references.
2 Data preparation (preprocessing)
Data preparation involves transforming raw data into a format that can be used by ML algorithms for extracting insights or predicting outcomes. This process is vital in ML as it considerably affects the performance of the model (Kelleher et al., 2015). In the event of missing or invalid data, the algorithm either cannot process it or yields less precise, possibly erroneous results. This procedure starts with the acquisition of raw data (refer to Section 2.2), followed by data integration, which entails consolidating data from various sources into a unified dataset. This is succeeded by data cleansing to rectify missing values and outliers (refer to Section 2.4), and then selecting the most pertinent features from the input parameters (feature selection or dimensionality reduction) (see Section 2.5). Subsequently, feature engineering is undertaken, which involves generating new variables from existing parameters using dimensional analysis (DA). Lastly, data transformation is carried out, which involves altering the scale or distribution of variables, such as through data normalization. Figure 1 depicts the multiple phases required for data preprocessing and the methods linked with each step. The upcoming sections provide a detailed explanation of these methods.
FIGURE 1
2.1 Marine data types
In coastal engineering, data can come in different forms (
Huang et al., 2015) and can be classified into different types based on their identity, format, and structure. Some examples of coastal data types include:
(1) Numeric data (Timmermans et al., 2020), which includes measurements of various physical parameters such as water level, wave height, current velocity, sediment concentration. Such data are typically collected using instruments such as tide gauges, wave gauges, current meters, and sediment samplers. For example, time-series data such as ocean temperature records, sea level measurements, and storm surge data represented by a sequence of observations or measurements taken at regular intervals over time.
(2) Image data (Vos et al., 2019; Turner et al., 2021), which includes aerial and satellite imagery, as well as ground-based photographs. These data can be used to study coastal morphology, vegetation, and land use patterns.
(3) Point Cloud data (Gomez, 2022), represented by a set of 3D points that can be used to create 3D models of coastal terrain and structures. Point cloud data is often collected using light detection and ranging (LiDAR) systems and can be used to create high-resolution digital elevation models (DEMs) of coastal topography.
(4) Video data (Smit et al., 2007; Kim et al., 2020; Kim and Kim, 2020), which includes footage captured by cameras, this data can be used to observe the coastal dynamics and measure the beach profile, the shoreline position, and the wave breaking patterns.
(5) Text data (Brown et al., 2021), represented by written or spoken words, can be analyzed using natural language processing (NLP) techniques. Examples of text data in coastal engineering include social media posts, news articles, and scientific publications.
The following are the most well-known methods for collecting the data mentioned above: field observations, remote sensing measurements, experimental studies, numerical and mathematical models. Both the availability of equipment and the objective of the study influence the selection of the data collection medium (Prata et al., 2019).
2.2 Marine data resources
Data collection within the realm of marine sciences principally relies on three distinctive methods: in-situ observations, remote sensing techniques, and the use of mathematical and numerical models, as outlined by Verwega et al. (2021). In-situ data collection encompasses ship-based measurements, the deployment of moorings, gliders, autonomous underwater vehicles, drifters and floats, the use of sea-floor optic cables, and laboratory analyses. Field observations remain essential for the collection of real-world data on coastal processes, such as wave heights and tidal levels. In-situ instruments are highly accurate with proper maintenance but may have low-time frequency data for large areas. They offer historical climate trend insights not available from remote sensing and are less affected by atmospheric conditions. These observations serve to validate numerical models that simulate coastal processes and predict the behavior of the coastal system, including wave patterns, tidal currents, and shoreline evolution.
Remote sensing involves acquiring data on coastal topography, bathymetry, and other significant parameters through satellite and airborne platforms. Remote sensing technologies are divided into three categories: satellite, ground-based, and drones (Elsayed et al., 2021). The data thus collected enable the generation of high-resolution coastal environmental maps. Although satellites are powerful tools, they face limitations in obtaining high-resolution regional-scale imagery. Clouds can hinder data capture, and high-resolution imagery can be challenging to interpret (Elsayed et al., 2021). A combination of satellite- and ground-based remote sensing and drones could be effective in future marine engineering evaluations. Economically, combining these tools may be comparable to in-situ techniques in terms of overall cost. Such technology could enable rapid, high-resolution water condition assessments and enhance our understanding of water resource processes. Mathematical and numerical models generate data by simulating real-life systems or processes using mathematical equations and algorithms (Xie and Arkin, 1996). They provide the capability to extend observational data, even to the point of simulating future climate scenarios (Eyring et al., 2016). Nonetheless, it is crucial to understand that these models only approximate real-world scenarios and can encompass spatial and temporal scales that exceed the scope of observational data (Matthes et al., 2020). The outputs from these models are typically available on a unique grid, contingent on the specific simulation. For instance, climate models customarily provide a four-dimensional space-time grid. Consequently, the comparison of model outputs with measurements invariably necessitates interpolation or data aggregation. Table 1 provides a detailed summary of the advantages and disadvantages associated with these diverse data collection methodologies.
TABLE 1
| Method (example Refs.) | Accuracy | Spatio-temporal resolution | Selected measured parameters | Pros and cons | |||
|---|---|---|---|---|---|---|---|
| Data collection category | In situ | Sampling | Sampling Kit Paradinas et al., (2021) | Extremely precise | Monitoring a single spot | Pressure, Wind Speed, Wave height, sea level | Very accurate; High spatial and temporal resolution; Expensive method and characterized by a lot of outlier data |
| Land fix instruments | Tide gauges Qiao et al., (2023) | Good | |||||
| Offshore fixed instruments | Buoy Meng et al., (2021)) | High | |||||
| Offshore campaign | Moving instruments Knight et al., (2020) | High | Monitoring a vast area | ||||
| Remote sensing | Satellites | Satelite Hagenaars et al., (2018); Turner et al., (2021) | Very high | Monitoring a vast area (meters to kilometers) | Mean wave period, significant wave height, ocean temperatur, water level, waves and currents | Long-term operation; high data generation; lower cost compared to in situ methods; dependence on empirical equations; Incomplete data availability; requirement for system calibration | |
| Land based instruments | Coastal radar Gawehn et al., (2020); LIDAR | Good | |||||
| Irish and White, (1998); Video monitoring | |||||||
| Soloy et al. (2021) | |||||||
| Onboarded instruments | Drones Joyce et al., (2023) | High | |||||
| Mathematical and numerical models | Basin wide models | The Copernicus Marine Service Copernicus, (2023) | Depends on: benchmarking data; numerical scheme; and selected equations | Depending on the available computational power and input data | Wave height, sediment flux, flow properties, bed level | Synchronization is maintained between all computational outposts; Cost-effective compared to in situ and remote sensing; Need to validate with other methods | |
| Local wide models | NSWE (Pourzangbar and Brocchini, (2022); FUNWAVE (Shi et al., (2012); SWAN Booij et al., (1997) | ||||||
Detailed information of the various data collection methods in coastal engineering.
2.3 Data cleaning: outlier detection
Several factors can influence the quality of observational data. These include inaccuracies in the instruments, malfunctions of the equipment, disruptions from external sources, mistakes during data conversion, communication mishaps, and significant unforeseen errors (Yu et al., 2022). Such anomalies can pose major threats to operational functionality, downstream operations, system resilience, and cleaner production (Ba-Alawi et al., 2021). Therefore, these should be detected promptly and their data rectified to ensure more realistic measurements.
Anomaly detection methods are generally categorized into various types (see Figure 2) such as Statistical Methods, that utilize the properties of the underlying data distribution to identify anomalies (Chandola et al., 2009); Distance-based Methods, which calculate the distance between data points and identify the outliers based on a certain distance threshold (Ramaswamy et al., 2000); Density-based Methods, which estimate the density of data points and identify outliers as those points that reside in low-density regions (Ester et al., 1996); Machine Learning-based Methods, which employ supervised, unsupervised, or semi-supervised ML algorithms to detect outliers (Pimentel et al., 2014); and Ensemble Methods, which combine multiple outlier detection algorithms to improve the overall performance (Zimek et al., 2012). The choice of method, or combination of methods for better results, depends on the nature of the data and the specific problem being addressed.
FIGURE 2
Mahmoodi and Ghassemi (2018) used outlier detection algorithms to improve wave height predictions, while Oehmcke et al. (2015) demonstrated the effectiveness of ML for identifying significant events in marine long-term data. Daranda and Dzemyda (2020) developed a method combining the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm and k-nearest neighbors analysis for detecting marine traffic anomalies. These studies highlight the potential of leveraging advanced algorithms and ML in marine data analysis and decision-making. This section aims to provide a survey of contemporary outlier detection techniques, comparing their motivations, advantages, and disadvantages. Outliers can significantly impact the results, which makes addressing or eliminating them before analysis and model development crucial.
Considering the learning algorithm, three main methodologies exist for outlier detection (Hodge and Austin, 2004): 1) unsupervised approach, which uses a learning technique to identify outliers without prior knowledge of the data. The data is treated as a static distribution, and the most distant points are flagged as potential outliers; 2) supervised classification method, which requires pre-labeled data. It allows for online classification, where the classifier continuously learns the model and classifies new data as normal or abnormal, and finally 3) semi-supervised recognition technique, which only learns the normal class, using pre-classified data. It can distinguish new data as normal or novel based on its proximity to the boundary of normality. The choice of an outlier detection method depends on the data type, the number of vectors and attributes, speed and accuracy requirements, and the ability to accurately identify outliers. The key factors in choosing a method are selecting an algorithm that can handle the data and defining a suitable neighborhood for the outlier.
2.4 Dimensionality reduction
Incorporating parameters that are not relevant can result in intricate models that pose significant challenges in interpretation and execution compared to the models developed using the most crucial parameters (Pourzangbar, 2012). That is the reason why the focus is placed on building ML models using the most crucial parameters. These parameters are not only essential for the model’s output, but also are unconnected with other input parameters. To derive the most important dimensions (parameters) in the input space, there are several methods including min/max autocorrelation factor analysis (MAFA), dynamic factor analysis (DFA), Least Absolute Shrinkage and Selection Operator (LASSO), Independent Component Analysis (ICA), multicollinearity test and PCA. Table 2 summarizes some famous dimensionality reduction approaches used in marine engineering. The latter two methods are explained below.
TABLE 2
| Method | References | Field of study | Remark |
|---|---|---|---|
| Tolerance and VIF based | Kaplan et al. (2010) | Explanatory Variables (Meteorological and hydrological variables) | • The best explanatory variable was identified, enhancing the overall model fit. |
| • The selected variables were not collinear, ensuring independent influence on the model. | |||
| Pourghasemi et al. (2018) | Landslide conditioning factors | • The study found no collinearity among the 17 landslide conditioning factors. | |
| • Logistic Regression and LogitBoost demonstrated superior performance compared to the NaïveBayes method. | |||
| Izadi et al. (2021) | The euphotic depth, sea surface temperature, and chlorophyll | • Stacking the same variables across different days increased the feature space significantly, even though this approach may introduce potential multicollinearity. | |
| • Input parameters that have a high correlation with the output parameter are considered significant. | |||
| El-Haddad et al. (2021) | Flood susceptibility prediction | • A multicollinearity analysis was conducted among nine flood-influencing factors. | |
| • The analysis showed that the tolerance (>0.1) and VIF (<10) of all flood-influencing factors meet the accepted standards, indicating no multicollinearity. | |||
| • Therefore, all the independent flood-influencing factors can participate in the model establishment for the current study. | |||
| Deroliya et al. (2022) | Flood risk mapping | • Variance Inflation Factor (VIF) analysis was performed. As a result, multicollinearity-free geomorphic flood descriptors (MFGFDs) were used as input features in the ML models. | |
| • Pearson correlation coefficients were calculated between all indicators with no high intercorrelations. As a result, the model was used to aggregate all available indicators after standardization, without the need for PCA. | |||
| MAFA and DFA based | Kuo et al. (2019) | water quality variables | • MAFA results identified the main water quality variables in densely populated zones (Zones 1 and 3). |
| • Primary water quality variations in agricultural cultivation zone were found. | |||
| • DFA results suggest influence of domestic and municipal effluent pollutants. | |||
| F-test | Hessami et al. (2008) | Automated regression-based statistical downscaling tool | • They examined the level of statistical significance of the predictors |
| PCA | Zhuang et al. (2022) | Port Planning | • PCA was used to predict the throughput of Dongjiakou Port. |
| • The model’s effectiveness was verified by comparing predicted outputs with actual outputs. | |||
| Park and Oh (2022) | Ship Propulsion Engine | • Principal Component Analysis and K-Nearest Neighbors were used for data preprocessing. These techniques were employed to check if data were classified based on engine control characteristics. • Two types of Principal Components were derived using PCA to simplify the data collected in full-navigation mode. This approach was used to analyze the impact of each factor and reduce the analysis time. | |
| Hua et al. (2007) | Temperature–Frequency Correlation | • PCA was initially used to extract principal components from the measured temperatures for dimensionality reduction. • The dominant feature vectors, along with the measured modal frequencies, were then used in a support vector algorithm to create regression models. | |
| Arslan et al. (2020) | Coastline Extraction on hyperspectral imagery | • SVM and Neural Network classification accuracies did not significantly differ on the provided images. Therefore, it could be concluded that using Dimensionality Reduction (DR) strategies on the dataset does not have a significant impact on identifying the location of coastlines. | |
| Freeman et al. (2021) | Marine hydrokinetic (MHK) turbines | • PCA enabled the maximum separation between classes to be depicted. Compared to other studies, the authors believe their method allows for more insightful inferences from PCA. | |
| • The authors’ proposed framework can identify the most important dimensions/features (i.e., RMS, Skewness) for fault detection when applying PCA on their feature space data matrix. | |||
| Sierra et al. (2017) | Analyzing coastal environments (grain size frequency curves) | • Functional Principal Component Analysis (FPCA) was identified as a suitable alternative with significant advantages over conventional vector analysis methods. | |
| • This is particularly true in the field of sedimentary geography studies. | |||
| Tayfur et al. (2013) | Sediment Load Prediction | • Predictive models were developed based on the outcomes of PCA. | |
| • The results show that PCA is beneficial in these types of studies. | |||
| El-Rahman (2016) | Hyperspectral image | • PCA was used as a data analysis technique to reduce the dimensions of hyperspectral images before the classification process. | |
| • This process employs an unsupervised Iterative Self-Organizing Data Analysis Technique (ISODATA) Algorithm. | |||
| LASSO | Tan et al. (2018) | Tropical cyclone | • After dimension reduction, the selected predictors retained a high explanatory capability for the complex information in the original data. |
| • They also maintained the features of each predictor effectively. | |||
| Tan et al. (2021) | Typhoon intensity | • Lasso and PCA were used for variable selection and dimensionality reduction. | |
| • A ML method, Hierarchical Bayesian Model (HBP), was employed to correct the storm intensity predicted by the Regional Climate Model (RCM). | |||
| ICA | Najafi et al. (2011) | Statistical downscaling of precipitation | • Performance assessment showed the procedure successfully selects predictors for downscaling Global Climate Model (GCM) data on both monthly and seasonal timescales. |
| • The study indicated that by choosing the appropriate predictors, the Multiple Linear Regression (MLR) model is an effective method for precipitation downscaling. |
Some well-known dimensionality reduction approaches and their example references.
2.4.1 Multicollinearity
Multicollinearity is a common issue that can arise in regression analysis when two or more predictor variables in a model are highly correlated with each other. This can cause problems in the analysis, such as unstable and unreliable coefficient estimates. There are several methods to detect multicollinearity in a regression model. Here are a few commonly used tests:
• Correlation matrix: A correlation matrix can be used to identify the degree of correlation between each pair of predictor variables. High correlation coefficients (e.g., greater than 0.7 or 0.8) may indicate multicollinearity.
• Variance Inflation Factor (VIF) quantifies how much the variance of the estimated regression coefficients is expanded due to multicollinearity. Suppose there are three input parameters: , and , and the goal is to compute VIF for . To accomplish this, we predict using linear regression based on and . Next, we determine the correlation coefficient between the predicted and actual values of , which we use to calculate VIF using the formula . Often, VIF values exceeding 5 or 10 serve as a benchmark for identifying variables that might pose problems.
If the VIF values for the independent variables are high, it indicates that multicollinearity is impacting the regression model. This issue might need to be resolved, possibly by removing one of the correlated variables, combining them, or applying methods such as ridge regression, or principal component analysis.
• Condition number: The condition number is a measure of the overall multicollinearity in the model and is calculated as the square root of the ratio of the largest to smallest eigenvalue of the correlation matrix. Condition numbers greater than 30 may indicate problematic multicollinearity.
• Eigenvalues: Eigenvalues of the correlation matrix can also be used to detect multicollinearity. Large eigenvalues (for example, greater than 1) may indicate high levels of multicollinearity.
• Tolerance (TOL) is another measure that can be used to detect multicollinearity in a regression model. It is the reciprocal of the VIF (variance inflation factor) and measures the proportion of the variance in a predictor variable that is not explained by the other predictor variables in the model. If the Tolerance value for a variable is close to 1, it suggests that there is no multicollinearity between that variable and the other predictor variables in the model. On the other hand, if the Tolerance value is close to 0, it indicates a high degree of multicollinearity between that variable and the other predictor variables in the model. In general, Tolerance values of less than 0.1 or 0.2 are indicative of problematic multicollinearity.
It is important to note that none of these tests can definitively prove the presence of multicollinearity, but rather provide evidence that it may be present in a model. Therefore, it is important to use multiple tests and to interpret the results in the context of the specific research question and data being analyzed.
2.4.2 Principle component analysis
PCA can be utilized for dimensionality reduction (Pearson, 1901). PCA reduces the dimensions of datasets in a way that their interpretability increases. To achieve this, PCA maximizes the variance of datasets by mapping them in a new coordinate (new uncorrelated variables). The most correlated parameters are deleted while information loss is minimum. The initially proposed method was limited to up to three parameters; however, Harold Hotelling has described methods for computing multivariate PCA since 1933 (Hotelling, 1933).
In the mathematical description, it is assumed that the input environment contains parameters and measurements for each parameter. Hence, the input matrix has components. The input environment can be transformed into a feature environment whose dimensions are not dependent on each other. Accordingly, the feature environment can be represented by a matrix, i.e., . The transformation can be done using a whitening or sphering transformation matrix () as follows:
The primary goal of PCA is to identify the components of the transformation matrix in such a way that the new variables exhibit maximum discrepancy (represented by variance). With some mathematical manipulation, the following equation for the transformation matrix can be derived:where is the covariance matrix of the input environment (), is a diagonal matrix whose components are the eigenvalues () of the matrix , and is a matrix that its components are the eigenvectors of .
2.5 Dimensional analysis
Although numerous methods exist for DA, the majority of studies employ the Buckingham π Theorem to render the parameters dimensionless. Table 3 summarizes some of the studies used DA before feeding their ML models.
TABLE 3
| Method | References | Field of study | Dimensionless relationship (well-known numbers) |
|---|---|---|---|
| The Buckingham π theorem | Bateni et al. (2007) | Scour depth prediction around bridge piers using ML approaches | (The Reynolds Number and The Froude Number) |
| Tayfur et al. (2013) | PCA and data-driven methods for enhancing sediment load prediction | (The Reynolds Number; The Froude Number; Dimensionless sediment diameter; The Mobility Number) | |
| Macayeal et al. (2011) | Iceberg-capsize tsunamigenesis | The Froude Number | |
| Jayaratne et al. (2016) | Tsunami-Induced Local Scour and Failure Mechanisms in Coastal Structures | The Shields parameter | |
| Deng et al. (2016) | Wave force on a vertical cylinder | (The Reynolds Number; Scattering parameter; The Keulegan– Carpenter number; The Froude Number) | |
| Ranasinghe et al. (2010) | Reaction of the Shoreline to a Single Submerged, Shore-Parallel Breakwater | ||
| Nakamura et al. (2008) | Tsunami-Induced Scour Surrounding a Square Structure | ||
| Peña et al. (2011) | Comparative experimental analysis of wave transmission coefficients, mooring line and module connector forces across various floating breakwater designs | ||
| Hong et al. (2013) | Propeller Jet-Induced Scour | (The Froude Number; Offset height ration; Relative Submergence) | |
| Karimpour et al. (2016) | Impacts of wind waves and currents on saltmarsh fringe deterioration | (The Reynolds Number) | |
| Santamaria Cervantes et al. (2022) | Uncertainties in coastal protection slope formulas | (The Reynolds Number; Wave Steepness; Relative Water Depth) | |
| Kitsikoudis et al. (2015) | Evaluating sand-bed river sediment transport | (The Reynolds Number; The Froude Number; The Shields parameter) |
Comparative overview of various studies utilizing DA and their derived dimensionless parameters.
2.6 Normalization
Normalizing data helps to ensure comparability by transforming it into a common scale, avoiding bias in statistical analyses and allowing for accurate and meaningful results by removing the impact of unit differences, especially when comparing data from different sources. Normalization plays a crucial role in efficient machine and deep learning by ensuring that large numerical inputs are processed effectively (Van Komen et al., 2022). The choice of normalization method depends on the specific requirements of the data and the problem being solved. Some of the famous methods for data normalization are summarized in Table 4.
TABLE 4
| Method | Equation | Literatures used this method |
|---|---|---|
| Max-min normalization | Pourzangbar et al. (2017b); Kramer (2013) | |
| Z-score normalization | Ewuzie et al. (2021); Masmoudi et al. (2021) | |
| Sigmoid normalization | Latif et al. (2023) | |
| log scaling | Bai et al. (2015); Pourzangbar et al. (2017a) |
Well known Normalization techniques used in ML modeling.
In Table 4, the transformed data, referred to as , is obtained by normalizing the original data () in a new range. The original data is contained within a vector, denoted as , and its minimum and maximum values are represented as and , respectively. The chosen minimum and maximum values for the transformed range are and , which are typically set to zero and one, respectively. is the mean of the data, and is the standard deviation of the data.
Min-Max normalization is a technique used to rescale a feature to a specific range, usually between 0 and 1. However, to avoid having zero data in the model, an alternative approach is to expand the range to include values between 0.1 and 0.9. It is a commonly used method for transforming variables so that they are comparable, as it scales the data linearly to a specific range. Through this normalization process, the values in are transformed such that the minimum value of is mapped to 0, the maximum value to 1, and intermediate values are mapped to corresponding values between 0 and 1. The Z-score normalization, also known as standardization, is a method of transforming data to a standard normal distribution with a mean of 0 and a standard deviation of 1. This normalization process rescales the data and centers it around the mean, allowing for easier comparison of values. It is commonly used in various fields, such as statistics, ML and data analysis. Sigmoid normalization uses a sigmoid function to transform the data, proving useful in instances where the data distribution is asymmetrical. The sigmoid function maps any input value to a value between 0 and 1 and it is commonly used in ML and ANN models to represent a probability or to rescale data. Additionally, the sigmoid function is differentiable, which makes it useful in optimization problems and backpropagation in neural networks.
In coastal phenomena, the relationship between inputs and outputs typically displays nonlinearity, but certain models, such as the M5 model tree, are unable to handle nonlinearity. To address this limitation, M5 models have been implemented using a logarithmic form for both inputs and outputs (i.e., the natural logarithm of inputs and outputs). This logarithmic form is more accurate than a linear formulation because it better captures the nonlinear nature of the contributing parameters (Pourzangbar et al., 2017a; Afsarian et al., 2018). Log scaling entails transforming data points through the application of a logarithmic function. The logarithm maps large values to smaller ones and vice versa, helping to make skewed data more symmetrical and manageable for analysis. The selection of a specific logarithmic function depends on the needs of the data and the analysis to be performed, such as log base 10, log base 2, or natural logarithm. Despite its advantages, normalization may result in a loss of interpretability, increased sensitivity to outliers (as seen in techniques like min-max scaling and z-score), loss of information, dependence on the entire dataset, impacts on categorical features, and varying sensitivity across algorithms.
3 AI learning algorithms and their application in marine/coastal engineering
3.1 Supervised-based ML methods
Supervised ML presents a powerful approach, necessitating labeled data for model training. Its versatility permits its usage across a variety of applications, such as image and speech recognition, natural language processing, and predictive analytics. Common algorithms used in supervised learning encompass linear regression, logistic regression (LR), decision trees, random forests, support vector machines, and neural networks. A key advantage of supervised learning is its capacity to generate precise predictions for novel and unseen data (Jiang et al., 2020). However, it also has certain drawbacks, including the requirement for labeled data, the quality and quantity of the training data, and the potential for overfitting. Despite these challenges, supervised learning is seen as an essential method in ML and data science, demonstrating high accuracy and less computational time compared to physical models. Despite the inherent complexity of marine processes, supervised-based ML models have demonstrated benefits in understanding coastal phenomena, thereby finding extensive application in coastal engineering to drive innovative models and solve intricate problems (as summarized in Table 5). Supervised ML models have been employed to predict wave parameters like significant wave height and period, wave reflection and transmission coefficients (van Gent et al., 2007; Gandomi et al., 2020; Kuntoji et al., 2020), tide levels (Lee, 2004), ocean currents and wind files (James et al., 2018; Shamshirband et al., 2020), prediction of wind Characteristics under future Climate Change scenarios (Yeganeh-Bakhtiary et al., 2022), flood inundation using Gaussian process model (Donnelly et al., 2022) and breakwater stability number and wave overtopping discharge, among others. Various ML models, such as ANN and SVM, can be employed to do these predictions. ML models have also found application in morphological and morphodynamic predictions, including profile elevation, area, and length, based on parameters like wind speed, direction, wave height, and beach angle (Hashemi et al., 2010).
TABLE 5
| Learning approach | Model type | Algorithm | Output (Reference) |
|---|---|---|---|
| Supervised learning | Classification | ANN | Coastal vulnerability map Ennouali et al., (2023); Coastal waters classification Pereira and Ebecken, (2009); Coastal Altimetric Waveforms Xu et al., (2021); Sea Surface Temperature Imagery Reggiannini et al., (2022) |
| SVM | |||
| RF | |||
| K-Nearest neighbor | |||
| Naive-Bayes classifer | |||
| Regression | ANN | Wave condition James et al., (2018); Significant wave height Ali et al., (2023); Breaking wave height Duong et al., (2023); Sediment load Latif et al., (2023); Wave attenuation Kim et al., (2022) | |
| SVM | |||
| Regression (linear, logistic) | |||
| Unsupervised learning | Clustring | K-means and K-median | Seabed color Wattelez et al., (2022); Land cover classification (Moody et al., (2014); Characteristics of Wastewater Discharges Di et al., (2019); Smart Port Construction Yao et al., (2018); Spatiotemporal Outlier Detection Chen et al., (2016); Ouliers in coastal water temperature Cho et al., (2013); Coastal environmental and atmospheric data reduction Mészáros et al., (2022); Surface water quality Moncada et al., (2021) |
| Hierarchical clustering | |||
| Density-based clustering | |||
| Gaussian mixture models | |||
| Anomaly detection | Statistical-based | ||
| Distance-based | |||
| Clustring-based | |||
| Density-based | |||
| Dimensionality reduction | PCA | ||
| Reinforcment learning | Model-free | Q-Learning | Real-time control of coastal urban stormwater systems Bowes et al., (2022); Flood mitigation Bowes et al., (2021); Maximize Energy Efficiency Sarkar et al., (2022) |
| Hybrid | |||
| Policy optimization | |||
| Model-based | Q-learning | ||
| Given the model |
Various ML learning approaches utilized in coastal studies, along with their associated models and methods.
3.2 Unsupervised-based ML methods
Unsupervised learning is a form of ML that functions without predefined labels or target outcomes (Bishop and Nasrabadi, 2006). Its main purpose is to independently discover patterns, structures, and relationships in data. Common applications include clustering, anomaly detection, and dimensionality reduction. Clustering groups similar data points, anomaly detection spotlights unusual patterns (as detailed in Section 2.3), and dimensionality reduction simplifies the number of features while preserving essential information (as seen in Section 2.4). Algorithms like k-means clustering, hierarchical clustering, PCA, and autoencoders are frequently used in unsupervised learning to identify patterns in data. While unsupervised learning can pose challenges due to the lack of a distinct optimization goal, it still holds a vital position in ML, contributing to advancements in fields such as computer vision, natural language processing, and recommendation systems. In the context of coastal engineering, k-means clustering can be used to classify centroid values for data like the maximum oceanic wind. Average centroid clustering can be obtained from both the previously chosen values and the currently selected clustering data (Baboo and Tajudin, 2013). PCA can be employed in coastal engineering to examine correlation matrices (Roseman et al., 2005) and pinpoint major changes in beach profiles and sand grain distributions (Tsujimoto et al., 2012). Moreover, PCA and hierarchical clustering can help characterize coastal plane shape and hydrodynamics. For instance, the form of arc-shaped coasts, largely influenced by geological structure, can be divided into four broad categories that reflect actual conditions using clustering (Scott et al., 2011). By identifying key data components, PCA can aid in elucidating the underlying patterns and structures of the data.
3.3 Reinforcement-based ML methods
Reinforcement learning (RL) is a type of ML where a program, known as an agent, learns to perform tasks by getting feedback from its environment in the form of rewards or penalties (Rengarajan et al., 2022). The agent executes a series of decisions in a mutable environment, aiming to learn the optimal way (or policy) to maximize rewards over time. This process is typically structured as a Markov Decision Process (MDP), encompassing states, actions, transition functions, and reward functions. There are two main types of reinforcement learning algorithms: model-based and model-free. Model-based RL is like making a map to understand the surroundings. On the other hand, model-free RL does not make a map; it just figures out what to do based on where it is at the moment. So, model-based RL is more about planning ahead, while model-free RL is more about learning on the go (Plaat et al., 2023). Model-free methods, like Q-learning, do not need a model of the environment and calculate the expected total of future rewards for each possible action at each state using the so-called the Bellman equation. Q-learning has been used successfully in many different tasks, which is why it is one of the most commonly used model-free RL algorithms. In coastal engineering, RL can be used to develop control policies to reduce the risk of flooding (Bowes et al., 2021). Deep reinforcement learning (DRL), an advanced form of RL, can be used to control devices that convert wave energy, and has been found to work better than traditional control methods (Anderlini et al., 2020). DRL can also adjust itself to changes in system dynamics, allowing for control even when faults occur. Moreover, RL has been used to maximize the electricity produced by wave energy converters (Zou et al., 2022). In addition, a type of RL called multiagent reinforcement learning can simulate the social and economic effects of sea level rise. This can be a useful tool for planning scenarios, analyzing costs and benefits, and optimizing strategies to adapt to changes (Shuvo et al., 2022).
Table 4 summarizes the various ML learning approaches and their corresponding model types. Each model type utilizes a unique set of algorithms. For instance, in the case of classification tasks, ANNs or SVMs may be utilized. The final column of the table highlights the research studies focusing on each specific learning approach, targeting the investigation of a specific coastal process or event.
3.4 AI contribution to the sustainability of marine environments
Predictive models, such as statistical, numerical, or ML models, play a vital role in marine and coastal engineering to safeguard structures from natural forces. Statistical models use past data to forecast future conditions, while numerical models simulate the event using mathematical equations and formulas. ML models, using artificial intelligence (AI), learn from past data for prediction purposes. Each model has its unique approach and is chosen based on data availability and specific project needs.
Various ML techniques have been implemented in the study of coastal and marine environments. Figure 3, sourced from Scopus, provides a visual representation of the percentage of published papers that used different ML methods since the year 2000. Upon reviewing this figure, it is evident that Principal Components Regression (PCR), Linear Model (LM), Regression Tree (RT), and ANN are the most frequently employed ML algorithms for analyzing coastal and marine phenomena. However, certain ML techniques, such as General Regression Neural Networks (GRNN), M5 model tree, Bayesian Model Averaging method (BMA), Generalized Boosted Regression (GBM), and Extreme Gradient Lift (Xgboost) have been applied less frequently in the investigation of coastal and marine events.
FIGURE 3
Figure 4 shows the application trend of different ML approaches for coastal and marine phenomena. Previously, techniques such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and RF were sparingly employed in diverse studies. However, there has been a significant increase in their use over the past 5 years, demonstrating a growing reliance on these methods in recent research.
FIGURE 4
Figure 5 illustrates the trend of various ML algorithms since 2008 in coastal and marine applications. Some methods are not frequently used which are colored in red (these low-important approaches are not reported in Figure 4).
FIGURE 5
3.4.1 Prediction of oceanographic and morphologic parameters
Researchers use ML algorithms and soft computing techniques to predict oceanographic and morphological parameters, as shown in Figure 6. These methods include ANNs, SVMs, Support Vector Regression (SVR), Fuzzy Logic (FL), evolutionary algorithms, such as Genetic Programming (GP) and DTs, among others. Predictive models are widely used in oceanography and coastal management. Their accuracy critically depends on several factors. These include the dataset used for training, the type and configuration of the ML model, tuning parameters, termination condition, and input and output parameters. It is important to note that specific algorithms, with carefully adjusted parameters, are particularly valuable in various research endeavors, depending on the problem being addressed.
FIGURE 6
ANNs, SVRs, M5 decision tree algorithm, and Recurrent Neural Networks (RNNs) including Long-Short-Term Memory (LSTM) models are used to predict wave heights, as per studies by Duong et al. (2023) and Rizianiza and Aisjah (2015). These ML techniques have shown reliable wave prediction capabilities, maintaining accuracy up to 72 h ahead (Jain and Deo, 2008). The use of intact structural data for predicting significant wave heights has been explored, with emphasis on the critical role of data quality in training ANNs for wave height predictions (Ciortan and Rusu, 2018; Demetriou et al., 2021). ANNs have also been implemented to estimate wave breaking heights considering various factors like seabed slope, water depth, and deep-sea wavelength (Duong et al., 2023). In the field of marine energy forecasting, researchers have used multi-class classification methods with ordinal classifiers, such as SVOREX and SVORIM yielding precise results (Fernández et al., 2015). RNN, especially LSTM models, have been employed to predict motion responses in irregular wave patterns (Kagemoto, 2020). Table 6 provides a summary of the top 10 highly-cited papers focused on predicting significant wave height using ML algorithms. The majority of these studies used meteorological data and past wave height as input parameters. The results demonstrate that LSTM neural networks, ANN, kernel-based predictors like SVM and SVR, as well as decision trees, are capable of accurately predicting wave height.
TABLE 6
| References | Method | Dataset | Inputs | Results |
|---|---|---|---|---|
| Mahjoobi and Etemad-Shahidi (2008) | C5 algorithm | wind and wave data from Lake Michigan, 2000–2004 | Wind speed | Decision trees, having similar error statistics to ANNs and an acceptable error range, are efficient for predictions and advantageous because they represent classification rules and linear equations. |
| CART | Wind direction | |||
| ANN | ||||
| Mahjoobi and Adeli Mosabbeb, (2009) | SVM (RBF) | 2086 records for Training and 2007 records for Testing | Wind speed | In modeling the wind speed, SVM outperformed ANN. |
| SVM (polynomial) | ||||
| ANN (MLP) | ||||
| ANN (RBF) | ||||
| Fernández et al. (2015) | ELMOR*; KDLOR*;ONN*; POM*; SVOREX* | Meteorological reanalysis and standard data from buoys were collected for the entire years of 2012 and 2013, from January 1st to December 31st. | Meteorological variables including air temperature, sea level pressure, the zonal component of the wind and the meridional component of the wind. | In modeling the meteorological variables ordinal classifiers (SVOREX and SVORIM) outperformed nominal classification and regression methods. |
| SVORIM*; SVR | ||||
| Cornejo-Bueno et al. (2016) | GGA-ELM | Data for two complete years (1st January 2009–31st December 2010) are used. | Wind direction and speed; Gust speed; Significant wave height; Dominant and Average wave period; Direction DPD; Atmospheric pressure; Air and water temperature | A hybrid GGA-ELM approach is proposed for accuracy in prediction of wind speed and direction. |
| The GGA-ELM selected features were tested using ELM and Support Vector Machine on a real-world problem, yielding good results. | ||||
| Berbić et al. (2017) | ANN and SVM | Collected wave height data from two Adriatic Sea locations, November 2007–2008. | Previous wave heights | The study utilized Weka software to predict significant wave heights using ANN and SVM methods, incorporating wind data. |
| Kumar et al. (2017) | MRAN* | Data from 13 stations across diverse global regions was collected from 2011–2015 for the study. | Wind speed | MRAN and GAP-RBF outperform SVR and ELM in daily wave height prediction, with MRAN surpassing GAP-RBF, using minimal network resources and accurately predicting significant wave heights. |
| GAP-RBF* | Wave height | |||
| SVR | ||||
| Kumar et al. (2018) | SLFN | The oel is trained via 10 diverse terrain stations from 2011 to 2014, and was tested using data from early to mid-2015. | Wave and atmospheric data | The Ens-ELM outperforms ELM, OS-ELM and SVR in the daily wave height prediction. |
| Ens-ELM | ||||
| SVR | ||||
| OS-ELM | ||||
| Ali and Prasad (2019) | ICEEMDAN-ELM | Hs data from Queensland, 2000–2018. Half-hourly intervals | Wave height at previous times | Hybrid ICEEMDAN-ELM outperforms comparative models like RF, ELM and MLR in Australia’s energy sites. |
| Fan et al. (2020) | LSTM neural network | Hourly data from ten global ocean buoys was used. Number of datapoints are 428770. | The previous wave height, sea surface temperature, wind direction and speed, and pressure | In predicting wave height, LSTM showed strong long-term prediction capacity, with the proposed SWAN-LSTM model improving prediction accuracy by over 65% compared to the standard SWAN model. |
| Shamshirband et al. (2020) | ANN | The wavedata recorded in Bushehr and Assaluye ports during 2008 are employed as target variables | Wind speed | All models, such as ANN, ELM and SVR, effectively predict outcomes, with a nested grid approach proving efficient for the study bathymetry. |
| ELM* | Wave height | The ELM slightly outperforms ANN and SVR, despite generally similar performances. | ||
| SVR |
Details of the selected reviewed papers, where the ML methods were used to predict the wave height.
*MRAN: minimal resource allocation network; Growing and Pruning Radial Basis Function (GAP-RBF); Extreme Learning Machine (ELM); Grouping Genetic Algorithm—Extreme Learning Machine approach (GGA-ELM); Ensemble of Extreme Learning Machine (Ens-ELM); Online Sequential ELM (OS-ELM); Kernel Discriminant Learning for Ordinal Regression (KDLOR); SVOR, with Implicit constraints (SVOR-IM); SVOR, with Explicit constraints (SVOR-EX); Proportional Odds Model (POM); Ordinal Neural Networks (ONN); ELMs have been adapted to ordinal regression (ELMOR)
To enhance understanding of the effectiveness of various ML methods in predicting wave height, visual representations of the correlation coefficient and Root Mean Square Error (RMSE) values for different ML techniques applied across multiple data sets have been created (
Figure 7). To achieve this, we carefully selected studies that used several ML methods for wave height predictions, ensuring each study used a consistent dataset. This allowed for a visual representation of the performance of these ML techniques with specific datasets. By comparing the overall performance of these ML models across various datasets, certain conclusions can be drawn.
• ANN and SVR algorithms are commonly used in predicting wave height.
• The count of neurons present in the hidden layers of ANNs slightly influences the precision of the model.
• Integrated algorithms, like ICEEMDAN-ELM, exhibit superior performance in terms of accuracy and error indices compared to other ML methods.
• There has been a significant increase in the adoption of ML algorithms, especially integrated algorithms, in recent years (see Figure 8).
FIGURE 7
FIGURE 8
The M5 decision tree algorithm, ANNs, and gradient boosting decision trees serve as robust tools for predicting wave overtopping discharge on coastal infrastructure such as breakwaters. When focusing on wave overtopping and runup, the M5 decision tree algorithm exhibits promising capabilities for predicting runup waves, taking into account laboratory data and multiple parameters (Abolfathi et al., 2016). ANNs are also used to predict wave reflection and transmission coefficients (Zanuttigh et al., 2016; Formentin et al., 2017). It has been proven that gradient boosting decision trees, as a novel ML technique, has improved the accuracy of predicting average wave overtopping discharges by nearly threefold in comparison to traditional neural networks (den Bieman et al., 2020). Kernel-based approaches, such as Gaussian Process Regression (GPR) and SVR, have also been utilized in predicting wave overtopping, with GPR showing superior performance over ANNs and empirical formulas (Hosseinzadeh et al., 2021).
The measurement of Sea Surface Temperature (SST) is vital for understanding the global climate. It significantly contributes to climate modeling, weather forecasting, and studies on marine ecosystems. Accurately predicting SST can aid in mitigating the environmental harm resulting from rising water temperatures due to human-induced climate change. This prediction not only benefits marine ecosystems but also preserves coastal economies and the broader coastal environment (Choi et al., 2023). LSTM neural networks have proven effective in forecasting SST, showing enhanced performances when the right amount of input data is used (Xu et al., 2020). Multivariate LSTM models, which take into account factors such as wind speed and sea-level air pressure alongside SST, have demonstrated superior results compared to univariate models that only factor in SST (Balogun and Adebisi, 2021). Traditional ML models have been studied for spatio-temporal time series prediction, highlighting the importance of spatial data. Among these, the LSTM model emerged as the most efficient, showing a 25% improvement in forecasting performance (based on RMSE) when spatial information was incorporated (Kartal, 2023). Research indicates that LSTMs, whether using single or multiple variables, surpass other ML models in predicting SST (Xu et al., 2020; Kartal, 2023).
Moreover, accurate predictions of coastal sediment transport are crucial for managing coastal erosion and development, with researchers traditionally estimating sediment transport using experimental methods. Artificial intelligence-based methods potentially improve decision-making for managing coastal erosion and development (Bakhtyar et al., 2008; Kabiri-Samani et al., 2011), given the importance of selecting valid input data and appropriate activation functions (Pourzangbar, 2012; Yeganeh-bakhtiary et al., 2012). Artificial intelligence and ML methods, such as Adaptive Network Based Fuzzy Inference Systems (ANFIS), Fuzzy Inference System (FIS), CERC (Coastal Engineering Research Center), Walton-Bruno (WB), Van Ridge (VR), and ANNs, have been employed to model sediment transport, with ANFIS showing higher accuracy and reliability for estimating longshore sediment transport rates (LSTR) (Bakhtyar et al., 2008; Hashemi et al., 2010). SVR has also been employed, demonstrating superiority over neural networks when the dataset is small or the relationships are linear or non-linear but with a clear margin (Dezvareh and Shafaghat, 2020). Deep learning models, like ANNs, have been developed to address the shortcomings of numerical models in analyzing simultaneous sand and sediment transport (Kim and Aoki, 2021).
3.4.2 Classification models
Classification involves categorizing items or data into groups based on their features, and is crucial in fields such as statistics, ML and data analysis. The goal is to create models that predict the class of new items by identifying patterns in their features. SVM was introduced in the 1990s, RF in the early 2000s, and LR has roots going back to the 19th century. These algorithms are capable of executing simple tasks such as recognition and classification (
Lou et al., 2021). In addition to these algorithms, a variety of other classification algorithms, including naive Bayes classifier, DTs, and K-Nearest Neighbors, have been utilized in remote sensing and
in situdata analysis to enhance the understanding and monitoring of the environment.
Table 7summarizes the most well-known classification models used in coastal and marine engineering. These algorithms have proven effective in unraveling complex environmental data and facilitating informed decision-making (
Tsiakos and Chalkias, 2023). Accordingly, the most famous classification methods are:
• SVM (Cortes and Vapnik, 1995): focuses on training samples near the optimal class boundary, aiming to maximize the margin between support vectors. Fundamentally, it is a binary classifier, and the processing time is managed by applying the classifier to every class combination.
• Regression Tree (RT) (Goldstein et al., 2019): break down prediction tasks into binary splits, forming a tree structure. This tool excels at classification tasks and enables an understanding of the influence of input variables. However, RTs may not be as effective for continuous variables and are prone to overfitting if not properly pruned. Accuracy can be boosted by merging small sequential RT models, giving more weight to poorly predicted data.
• Decision Trees (Pal and Mather, 2003): easy to understand, DTs recursively split data. They can use categorical data and perform classification quickly. However, DTs may suffer from overfitting and non-optimal solutions, which can be addressed through pruning.
• RF (Breiman, 2001): an ensemble classifier using multiple DTs to overcome their limitations. Each tree uses a random subset of training data and features, resulting in a more accurate ensemble. RF classifiers are known for their speed, resistance to overfitting, and ability to handle multicollinearity. They can also assess the importance of variables, although they may be sensitive to certain sampling strategies (Belgiu and Drăgu, 2016).
• Kernel and Nearest Neighbor (K-NN) classifier (Altman, 1992): The K-NN classifier is distinct from other classifiers because it does not create a model during the training phase. Instead, every unclassified sample is directly compared with the original training data.
• Naive-Bayes classifer: it is a classification algorithm that is based on Bayes’ theorem and assumes that the presence or absence of one feature is independent of the presence or absence of other features. It learns the probability distribution of features and corresponding labels from a training dataset and uses it to classify new examples. This algorithm is widely used in applications that have many features and large datasets, such as text classification, sentiment analysis, and spam filtering. The Naive Bayes classifier is computationally efficient and can handle high-dimensional data well.
TABLE 7
| References | Method | Dataset | Inputs | Results |
|---|---|---|---|---|
| Output(s) | ||||
| Heumann (2011) | Object-Based Image Analysis (OBIA) that melded a DT with SVM classification methods | Images from the Worldview-2 sensor | Vegetation field data | The study correctly identified true mangroves with over 94% accuracy. However, it struggled to map fringe mangroves due to spectral and zoning issues, especially in sparse or degraded areas. |
| Mangrove Associates | ||||
| Kalkan et al. (2013) | object-based classification (OBC) and SVM | The Lakeland region of Turkey | Coastline features | Automatic coastline extraction methods were compared to manual digitization, showing both methods achieved sub-pixel accuracy in detecting coastline features from Landsat 8 imagery. |
| and Landsat 8 data | ||||
| Kong et al. (2017) | GS optimized SVM | 324 sampling sites collected across the Yellow Sea and East China Sea | DO, Chl-a, C1, C2, C3, and C4 and the TRIX index | The method demonstrated high predictive performance and accurate eutrophication status classification |
| eutrophication status of coastal waters | The findings support the feasibility of using SVM technique for rapid evaluation of eutrophication status with easily measured parameters. | |||
| Adam et al. (2014) | RF | A dataset from the 2010 KwaZulu-Natal provincial LULC map | RapidEye image | High spectral variation challenges RF and SVM in classifying certain LULC types, but incorporating the red-edge band significantly improves vegetation cover type classification accuracy. |
| land-use/cover (LULC) map | ||||
| Li and Wang (2011) | RF and Markov chain | 1998 to 2009 in Tianjin Environmental Aspect Bulletin | Time-series Sea water quality | Random Forests and Markov chain were used to fit a function relating transition probability to pollution and environmental investment, based on historical data. |
| Sea water quality | ||||
| Liu et al. (2021) | CNN | Coastal images and tidal data (20+ years) | Hourly coastal images and tidal data | CNN provides location and shape information of offshore dam, coastline, waves at the coastal dam, and trough data for classification decision-making. |
| Categorized beach states (8 classes) | It has good generalization ability. | |||
| Hoonhout et al. (2015) | Structured Support Vector Machine (SSVM) | Manually annotated dataset of 192 coastal images | Coastal images | Pixel classification accuracy: 93.0% |
| Pixel-wise classification (water, sand, vegetation, sky, object) | Algorithm extracts beach widths and water lines from coastal camera images without manual quality control. | |||
| It enables the analysis of large, long-term coastal imagery datasets and the application to various types of coastal images. | ||||
| Annotated dataset and open-source software are provided for free, promoting further research in coastal image analysis. | ||||
| Shafaghat and Dezvareh (2021) | SVM | Coasts of Hormozgan province | Wave height, direction, period, and particle size | SVM accurately categorizes sediment transport rate into critical and non-critical states for each beach, using a Gaussian kernel (RBF) and optimal coefficients of C = 9 and σ = 0.28. |
| Sediment transport rate | ||||
| Mahjoobi and Etemad-Shahidi (2008) | CART | 5 years (2000–2004) of wave and wind data from Lake Michigan | Wave and wind data | The results of decision trees were compared to those of ANNs, showing similar error statistics. |
| and C5 algorithm | Significant wave height | The decision tree approach is considered efficient and successful for predicting significant wave heights and offers the advantage of visualizing decision rules compared to neural networks. | ||
| Çelik and Gazioğlu (2022) | SVM, MLP and Ensemble Learning (EL) | bedrock, beaches, and artificial coasts | Coastlines | Classifiers were accurate on unshaded bedrock coasts, and their results were similar. |
| Extraction errors were encountered on bedrock coasts due to shadows, and MLP classifiers with Linear, Logarithmic, and Tanh activation functions were found to be the most accurate. | ||||
| Beach type coasts presented challenges due to shallow depths and suspended solids affecting classification accuracy. EL classifiers and SVMs with sigmoidal kernel function were adversely affected, but the best results were obtained by other SVMs and MLP classifiers. | ||||
| On artificial coasts, all classifiers provided accurate categorizations. | ||||
| Shenbagaraj et al. (2014) | ISODATA (unsupervised classifiers) | sensor, Toposheet and Google Earth Images were used over a 60 year period from 1953 to 2013 between Kolachel and Kayalpattanam | Coastline changes | This approach effectively identified the areas of coastline transgression and regression in the study area. |
| Rokni et al. (2015) | ANN; SVM; Maximum Likelihood | August 2000 to July 2010; Lake Urmia, Northwest of Iran | Fused images highlighting changed areas, classified maps | The proposed approach effectively detected surface water changes, especially when using the Gram Schmidt-ANN and Gram Schmidt-SVM techniques. The results show that Lake Urmia lost about one third of its surface area in the 2000–2010 period. |
| Sekovski et al. (2014) | The satellite imagery is from 2011, and lidar data is from 2005. 40 km stretch of coastline in the Municipality of Ravenna, Northern Adriatic Sea, Italy. | Four supervised image classification techniques (Parallelepiped, Gaussian Maximum Likelihood, Minimum-Distance-to-Means, and Mahalanobis distance) and the unsupervised ISODATA | High-resolution multispectral WorldView-2 satellite imagery from 2011, and airborne lidar data from 2005. | Shorelines produced by ISODATA and Mahalanobis show the highest agreement with reference shorelines, having an average median distance of 2.2 m. Parallelepiped and Maximum Likelihood shorelines had the highest average median distance from the reference shoreline (5.1 and 5.6 m, respectively). Heterogeneous coastal stretches exhibited a larger offset between extracted and reference shorelines than homogeneous ones. The comparison between the Mahalanobis classification results and lidar data detected an erosive trend in a wide portion of the study area. |
| Delineated shorelines |
Overview of highly-cited literature studies (extracted from Scopus) on classification models in coastal and marine phenomena.
KNN classifier has been used in various marine-related projects. For the design of marine hydrokinetic turbines, KNN was used to identify and categorize the severity of the rotor blade pitch imbalance encountered by marine current turbines. This approach was found useful for fault detection and severity classification (Freeman et al., 2021). In ocean surface current forecasting, KNN was used as an alternate method (Jirakittayakorn et al., 2017). The KNN algorithm proved capable of forecasting future surface currents up to 24 h in advance. The KNN approach was compared with other prediction techniques such as ARIMA, exponential smoothing, and LSTM, and it was found that the KNN model had the highest accuracy. KNN was one of the six ML classifiers used to generate precise geographic estimates of seabed substrate and seabed habitat mapping (Diesing and Stephens, 2015; Leon et al., 2020). The accuracy of the predictions was evaluated using ground-truth sample data segmented into classes of seabed substrate. In coastal hazards projection, KNN was used to project dangers using several representative concentration route climate change scenarios, regional climate models, and sea level rise ratios (Park and Lee, 2020). Seafloor classification is another marine-related project where KNN was used along with ANN to class the structure of the seafloor and to pinpoint potential anthropogenic effects on delicate benthic assemblages (Gauci et al., 2016). Finally, in sea-land segmentation, KNN was used to produce a pixel-level, sea-land segmentation of the scene based on the Doppler bandwidth of a returns vector in maritime surveillance radars (Shui et al., 2020).
The Naive Bayes classifier is a machine learning algorithm commonly utilized in various applications to enhance model accuracy. A prominent application of the Naive Bayes classifier involves predicting water quality classes utilizing seven popular Water Quality Index (WQI) models (Uddin et al., 2023). There is some confusion about the proper classification of water quality due to differing techniques used in current WQI models. To address this, the Naive Bayes was compared with other ML classifiers. These included SVM, Random Forest, K-Nearest Neighbor, and Gradient Boosting. The goal was to determine the best classifier for evaluating water quality. Another application of the Naive Bayes classifier is in detecting small-scale assemblages of drifting vegetation and beach cast in Germany’s Baltic coast (Uhl et al., 2022). To obtain the best classification results, the classifier was used as part of an ensemble of five classifiers, including a RF, CART, SVM, and stochastic gradient boosting classifier to predict tropical Cyclone based on multi-model fusion across Indian coastal region (Varalakshmi et al., 2021). In all applications, the Naive Bayes classifier was effective in improving the accuracy of the models, particularly in predicting the quality of coastal water and detecting small-scale assemblages of drifting vegetation and beach cast. Its versatility and usefulness in different domains make it a popular choice for improving the accuracy of models in various applications.
Given coastaline extraction from satellite images, three well known methods including image processing techniques, unsupervised classifiers and supervised classifiers have been implemented. Shenbagaraj et al. (2014) employed visual interpretation and ISODATA (Iterative Self-Organizing Data Analysis Technique) classification techniques to extract shorelines from Landsat Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+) sensor images, Toposheet and Google Earth Images spanning a 60-year period from 1953 to 2013 between Kolachel and Kayalpattanam. This approach effectively identified the areas of coastline transgression and regression in the study area. Supervised classifiers such as Maximum Likelihood (Rokni et al., 2015), SVM & ANN & EL (Çelik and Gazioğlu, 2022), RF (Bayram et al., 2017), Minimum-Distance-to-Means, and Mahalanobis distance (Sekovski et al., 2014) also have been employed to classify and detect the coastline position based on the satellite images. As depicted in Figure 9, the average median distance of all shorelines, observed in relation to the reference, suggests that the shorelines produced by the ISODATA and Mahalanobis methods demonstrate the best alignment, with a discrepancy of 2.2 m, thereby being closer to the reference than other methods. Conversely, the Parallelepiped and Maximum Likelihood methods resulted in shorelines with the highest average median distance from the reference shoreline, measuring 5.1 m and 5.6 m respectively.
FIGURE 9
4 Summary and conclusion
This study provides a comprehensive review of machine learning applications to model the marine and coastal environments, with comprehensive coverage from data preprocessing to the application of different models. The review indicates that appropriately implemented and optimized ML methods can significantly contribute to marine and coastal sustainability through developing accurate and robust models for prediction of wave height, oceanographic parameters, and sediment transport, image processing, optimization of coastal and marine structures design.
Here are some insights based on your review:
1. Dependence on data quality: the study concludes by reminding us that the efficacy of ML models heavily relies on factors such as the quality of datasets, the type and configuration of the ML model, and tuning parameters. It reemphasizes the importance of sound data science practices in applying ML.
2. Exploitation of data: this paper underlines the importance of data preprocessing, including data cleaning, dimensionality reduction, and normalization in machine learning models. This emphasizes the pivotal role of quality data in the effectiveness of ML applications in modelling phenomena such as wave patterns, coastal erosion, and sediment transport in marine and coastal environments.
3. Diverse machine learning approaches: the current paper is examined three primary types of ML including supervised, unsupervised and reinforcement learning, and their respective applications in marine and coastal science. Supervised learning, using algorithms such as decision trees and neural networks, leverages labeled data to predict parameters like wave height and wind speed, and make morphodynamic predictions. Unsupervised learning, on the other hand, independently discovers patterns and relationships in data for tasks like clustering and anomaly detection, and has been employed to classify wind values and examine beach profiles. Reinforcement learning, operating on a reward or penalty system, plays a vital role in devising control policies and planning for future scenarios in areas like flood risk reduction and wave energy conversion. Various ML methods such as PCR, LM, RT, and ANN are instrumental in facilitating these applications.
4. Classification algorithms: classification algorithms such as Kernel- and Tree-based models play crucial roles in environmental data interpretation and decision-making. SVM is known for its binary classification capabilities, while RT and DT provide swift classification and a better understanding of input variables. RF offers robustness against overfitting and efficiently manages multicollinearity. The KNN classifier performs well in comparing unclassified samples with training data. Naive Bayes, using Bayes’ theorem, efficiently processes and analyzes high-dimensional data and is often used in predicting water quality and tropical cyclone trajectories.
5. Application of ML: from forecasting oceanographic and morphologic parameters to estimating longshore sediment transport rates, the use of ML significantly enhances the capacity for prediction and understanding of marine and coastal environments. ANNs and SVR are frequently used for wave height predictions. Their accuracy and reliability help in crucial areas such as managing coastal erosion and development. The prediction of SST using ML, specifically LSTM neural networks, has shown great promise. Accurate SST prediction can contribute significantly to climate modeling, weather forecasting and the preservation of marine ecosystems. ANFIS has shown accuracy and reliability in estimating longshore sediment transport rates, which is essential for managing coastal erosion and development.
6. The growing role of new techniques: the rising prominence of deep neural networks, convolutional neural networks, and random forests is indicative of the evolution of the field, and the increasing complexity of the problems being addressed. These advanced techniques often deliver superior performance and can manage more complex and high-dimensional datasets. Integrated algorithms such as ICEEMDAN-ELM exhibit superior performance. The adoption of ML algorithms has seen a significant increase in recent years.
4.1 Recommendations for future research endeavours
• Developing hybrid models: the employment of combined and hybrid models has exhibited significant success, notably in addressing multifaceted issues. Eslaminezhad et al. (2022) advanced the efficiency of tree-structured machine learning models in determining the crucial parameters for forecasting flood susceptibility and constructing flood susceptibility maps, through the incorporation of the BPSO algorithm.
• Developoing physical-based machine learning: it is apparent that machine learning models do not adequately consider the actual physical elements of the problem. Consequently, the prospect of integrating physical-based machine learning approaches is recommended for further contemplation.
• Implementing domain adaptation techniques: to address the regional restrictions inherent in existing models, it might be prudent to consider the application of domain adaptation techniques.
• Evaluating models’ uncertainty: it is essential to acknowledge that inherent uncertainty is a fundamental aspect of any model. Thus, it is proposed that the models’ uncertainty be consistently documented, and appropriate methodologies be utilized to alleviate it.
• Development of appropriate scaling techniques: by developing appropriate scaling techniques, one ensures that all features contribute equally to the final prediction, thereby improving the performance of the machine learning model.
Statements
Author contributions
AP: Supervision, Compilation and Integration of Data, Data Curation, Software, Validation, Visualization, Writing—Review and Editing. MJ: Literature Search, Information Provision, Writing—Review and Editing. MB: Supervision, Writing—Review and Editing, Funding Acquisition, Project Administration. All authors contributed to the article and approved the submitted version.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Glossary
| Abbreviation | symbol Definition |
| AI | Artificial Intelligence |
| ANFIS | Adaptive Network Based Fuzzy Inference Systems |
| ANN | Artificial Neural Network |
| ANN-MLP | Artificial Neural Network (Multilayer Perceptron) |
| ANN-RBF | Artificial Neural Network (Radial Basis Function) |
| ANOVA | Analysis of Variance |
| BMA | Bayesian Model Averaging method |
| CART | Classification And Regression Trees |
| CERC | Coastal Engineering Research Center |
| CNN | Convolutional Neural Networks |
| DA | Dimensional Analysis |
| DBSCAN | Density-Based Spatial Clustering of Applications with Noise |
| DEM | Digital Elevation Model |
| DFA | Dynamic Factor Analysis |
| DNN | Deep Neural Networks |
| DRL | Deep Reinforcement Learning |
| DT | Decision Tree |
| ELM | Extreme Learning Machine |
| ELMOR | Extreme Learning Machine for Ordinal Regression |
| Ens-ELM | Ensemble of Extreme Learning Machine |
| FIS | Fuzzy Inference System |
| FL | Fuzzy Logic |
| FUNWAVE | Fully Nonlinear Boussinesq Wave model |
| GAP-RBF | Growing and Pruning Radial Basis Function |
| GBM | Generalized Boosted Regression |
| GCM | Global Climate Model |
| GGA-ELM | Grouping Genetic Algorithm—Extreme Learning Machine approach |
| GIS | Geographic Information Systems |
| GP | Genetic Programming |
| GPR | Gaussian Process Regression |
| GRNN | General Regression Neural Networks |
| HBP | Hierarchical Bayesian Model |
| ICA | Independent Component Analysis |
| ICEEMDAN-ELM | Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise-Extreme Learning Machine |
| ISODATA | Iterative Self-Organizing Data Analysis Technique |
| KDLOR | Kernel Discriminant Learning for Ordinal Regression |
| KNN | K-Nearest Neighbors |
| LASSO | Least Absolute Shrinkage and Selection Operator |
| LiDAR | Light Detection and Ranging |
| LM | Linear Model |
| LSTM | Long-Short-Term Memory model |
| LSTM neural network: | Long Short-Term Memory neural network |
| LSTR | Longshore Sediment Transport Rates |
| MAFA | Min/Max Autocorrelation Factor Analysis |
| MDP | Markov Decision Process |
| ML | Machine Learning |
| MLR | Multiple Linear Regression |
| MRAN | Minimal Resource Allocation Network |
| NLP | Natural Language Processing |
| NSWE | Nonlinear Shallow Water Equations |
| ONN | Ordinal Neural Networks |
| OS-ELM | Online Sequential Extreme Learning Machine |
| PCA | Principal Component Analysis |
| POM | Proportional Odds Model |
| Q-Learning: | A model-free reinforcement learning algorithm |
| RCM | Regional Climate Model |
| RF | Random Forest |
| RL | Reinforcement Learning |
| RNN | Recurrent Neural Network |
| RT | Regression Tree |
| SLFN | Single Layer Feedforward Neural Network |
| SST | Sea Surface Temperature |
| SVM | Support Vector Machine |
| SVM (polynomial) | Support Vector Machine (Polynomial) |
| SVM-RBF | Support Vector Machine (Radial Basis Function) |
| SVOR-EX | Support Vector Ordinal Regression with Explicit constraints |
| SVOR-IM | Support Vector Ordinal Regression with Implicit constraints |
| SVR | Support Vector Regression |
| SWAN | Simulating WAves Nearshore model |
| TOL | Tolerance |
| VIF | Variance Inflation Factor |
| VR | Van Ridge formula |
| WB | Walton-Bruno formula |
| X | Vector of original data |
| μ | Mean |
| σ | Standard Deviation |
| Individual data point in X | |
| Normalized data | |
| Minimum value of X | |
| Maximum value of X |
References
1
AbolfathiS.Yeganeh-BakhtiaryA.Hamze-ZiabariS. M.BorzooeiS. (2016). Wave runup prediction using M5′ model tree algorithm. Ocean. Eng.112, 76–81. 10.1016/J.OCEANENG.2015.12.016
2
AdamE.MutangaO.OdindiJ.Abdel-RahmanE. M. (2014). Land-use/cover classification in a heterogeneous coastal landscape using RapidEye imagery: evaluating the performance of random forest and support vector machines classifiers. Int. J. Remote Sens.35, 3440–3458. 10.1080/01431161.2014.903435
3
AfsarianF.SaberA.PourzangbarA.OlabiA. G.KhanmohammadiM. A. (2018). Analysis of recycled aggregates effect on energy conservation using M5″ model tree algorithm. Energy156, 264–277. 10.1016/j.energy.2018.05.099
4
AgarwalP.ManuelL. (2008). Extreme loads for an offshore wind turbine using statistical extrapolation from limited field data. Wind Energy11, 673–684. 10.1002/we.301
5
AgrafiotisP.SkarlatosD.GeorgopoulosA.KarantzalosK. (2019). DepthLearn: learning to correct the refraction on point clouds derived from aerial imagery for accurate dense shallow water bathymetry based on SVMs-fusion with LiDAR point clouds. Remote Sens.11, 2225. 10.3390/rs11192225
6
AkbarifardS.RadmaneshF. (2018). Predicting sea wave height using Symbiotic Organisms Search (SOS) algorithm. Ocean. Eng.167, 348–356. 10.1016/J.OCEANENG.2018.04.092
7
AliM.PrasadR. (2019). Significant wave height forecasting via an extreme learning machine model integrated with improved complete ensemble empirical mode decomposition. Renew. Sustain. Energy Rev.104, 281–295. 10.1016/J.RSER.2019.01.014
8
AliM.PrasadR.XiangY.JameiM.YaseenZ. M. (2023). Ensemble robust local mean decomposition integrated with random forest for short-term significant wave height forecasting. Renew. Energy205, 731–746. 10.1016/J.RENENE.2023.01.108
9
AltmanN. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. Am. Statistician46, 175–185. 10.1080/00031305.1992.10475879
10
AnderliniE.HusainS.ParkerG. G.AbusaraM.ThomasG. (2020). Towards real-time reinforcement learning control of a wave energy converter. J. Mar. Sci. Eng.8, 845. 10.3390/jmse8110845
11
ArslanO.AkyürekÖ.KayaŞ.ŞekerD. Z. (2020). Dimension reduction methods applied to coastline extraction on hyperspectral imagery. Geocarto Int.35, 376–390. 10.1080/10106049.2018.1520920
12
Ba-AlawiA. H.VilelaP.Loy-BenitezJ.HeoS. K.YooC. K. (2021). Intelligent sensor validation for sustainable influent quality monitoring in wastewater treatment plants using stacked denoising autoencoders. J. Water Process Eng.43, 102206. 10.1016/j.jwpe.2021.102206
13
BabooS. S.TajudinK. (2013). “Clustering centroid finding algorithm (CCFA) using spatial temporal data mining concept,” in 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering 2013, Salem, India, February 21–22, 2013 (IEEE (Institute of Electrical and Electronics Engineers), 30–36. 10.1109/ICPRIME.2013.6496443
14
BaiY.CaiW. J.HeX.ZhaiW.PanD.DaiM.et al (2015). A mechanistic semi-analytical method for remotely sensing Sea Surface pCO2 in river-dominated coastal oceans: a case study from the east China sea. J. Geophys. Res. Oceans120, 2331–2349. 10.1002/2014JC010632
15
BakhtyarR.GhaheriA.Yeganeh-BakhtiaryA.BaldockT. E. (2008). Longshore sediment transport estimation using a fuzzy inference system. Appl. Ocean Res.30, 273–286. 10.1016/J.APOR.2008.12.001
16
BalogunA. L.AdebisiN. (2021). Sea level prediction using ARIMA, SVR and LSTM neural network: assessing the impact of ensemble ocean-atmospheric processes on models’ accuracy. Geomatics, Nat. Hazards Risk12, 653–674. 10.1080/19475705.2021.1887372
17
BateniS. M.BorgheiS. M.JengD. S. (2007). Neural network and neuro-fuzzy assessments for scour depth around bridge piers. Eng. Appl. Artif. Intell.20, 401–414. 10.1016/j.engappai.2006.06.012
18
BayramB.ErdemF.AkpinarB.InceA. K.BozkurtS.Catal ReisH.et al (2017). The efficiency of random forest method for shoreline extraction from landsat-8 and gokturk-2 imageries. ISPRS Ann. Photogrammetry, Remote Sens. Spatial Inf. Sci., 141–145. 10.5194/isprs-annals-IV-4-W4-141-2017
19
BelgiuM.DrăguţL. (2016). Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogrammetry Remote Sens.114, 24–31. 10.1016/J.ISPRSJPRS.2016.01.011
20
BerbićJ.OcvirkE.CarevićD.LončarG. (2017). Application of neural networks and support vector machine for significant wave height prediction. Oceanologia59, 331–349. 10.1016/J.OCEANO.2017.03.007
21
BeuzenT.GoldsteinE. B.SplinterK. D. (2019). Ensemble models from machine learning: an example of wave runup and coastal dune erosion. Nat. Hazards Earth Syst. Sci.19, 2295–2309. 10.5194/nhess-19-2295-2019
22
BishopC. M.NasrabadiN. M. (2006). Pattern recognition and machine learning. Springer.
23
BooijN.HolthuijsenL. H.RisR. C. (1997). The swan wave model for shallow water. Coast. Eng. 1996. 10.1061/9780784402429.053
24
BowesB. D.TavakoliA.WangC.HeydarianA.BehlM.BelingP. A.et al (2021). Flood mitigation in coastal urban catchments using real-time stormwater infrastructure control and reinforcement learning. J. Hydroinformatics23, 529–547. 10.2166/HYDRO.2020.080
25
BowesB. D.WangC.ErcanM. B.CulverT. B.BelingP. A.GoodallJ. L. (2022). Reinforcement learning-based real-time control of coastal urban stormwater systems to mitigate flooding and improve water quality. Environ. Sci. Water Res. Technol.8, 2065–2086. 10.1039/d1ew00582k
26
BreimanL. (2001). Random forests. Mach. Learn.45, 5–32. 10.1023/A:1010933404324
27
BrownJ. M.YellandM. J.PullenT.SilvaE.MartinA.GoldI.et al (2021). Novel use of social media to assess and improve coastal flood forecasts and hazard alerts. Sci. Rep.11, 13727. 10.1038/s41598-021-93077-z
28
ÇelikO. İ.GazioğluC. (2022). Coast type based accuracy assessment for coastline extraction from satellite image with machine learning classifiers. Egypt. J. Remote Sens. Space Sci.25, 289–299. 10.1016/J.EJRS.2022.01.010
29
ChandolaV.BanerjeeA.KumarV. (2009). Anomaly detection: a survey. ACM Comput. Surv.41, 1–58. 10.1145/1541880.1541882
30
ChenJ.AbbadyS.DuggimpudiM. B. (2016). Spatiotemporal outlier detection: did buoys tell where the hurricanes were?Pap. Appl. Geogr.2, 298–314. 10.1080/23754931.2016.1149874
31
ChenW.MaW. (2010). “Applications based on genetic neural network model of Lianyungang marine water quality optimization techniques and algorithms Technology”. 2010 International Conference of Information Science and Management Engineering 20101, 526–529. 10.1109/ISME.2010.253
32
ChoH. Y.OhJ. H.KimK. O.ShimJ. S. (2013). Outlier detection and missing data filling methods for coastal water temperature data. J. Coast. Res.165, 1898–1903. 10.2112/si65-321.1
33
ChoiH. M.KimM. K.YangH. (2023). Deep-learning model for sea surface temperature prediction near the Korean Peninsula. Deep Sea Res. Part II Top. Stud. Oceanogr.208, 105262. 10.1016/J.DSR2.2023.105262
34
CiortanS.RusuE. (2018). Prediction of the wave power in the Black Sea based on wind speed using artificial neural networks. E3S Web Conf.51, 01006. 10.1051/e3scconf/20185101006
35
Copernicus (2023). The copernicus marine service. Available at: https://marine.copernicus.eu/about.
36
Cornejo-BuenoL.Nieto-BorgeJ. C.García-DíazP.RodríguezG.Salcedo-SanzS. (2016). Significant wave height and energy flux prediction for marine energy applications: a grouping genetic algorithm – extreme learning machine approach. Renew. Energy97, 380–389. 10.1016/J.RENENE.2016.05.094
37
CortesC.VapnikV. (1995). Support-vector networks. Mach. Learn.20, 273–297. 10.1023/A:1022627411411
38
CuadraL.Salcedo-SanzS.Nieto-BorgeJ. C.AlexandreE.RodríguezG. (2016). Computational intelligence in wave energy: comprehensive review and case study. Renew. Sustain. Energy Rev.58, 1223–1246. 10.1016/j.rser.2015.12.253
39
DarandaA.DzemydaG. (2020). Navigation decision support: discover of vessel traffic anomaly according to the historic marine data. Int. J. Comput. Commun. CONTROL15. 10.15837/IJCCC.2020.3.3864
40
DavidsonM. A.BirdP. A. D.BullockG. N.HuntleyD. A. (1996). A new non-dimensional number for the analysis of wave reflection from rubble mound breakwaters. Coast. Eng.28, 93–120. 10.1016/0378-3839(96)00012-9
41
DemetriouD.MichailidesC.PapanastasiouG.OnoufriouT. (2021). Coastal zone significant wave height prediction by supervised machine learning classification algorithms. Ocean. Eng.221, 108592. 10.1016/j.oceaneng.2021.108592
42
den BiemanJ. P.WilmsJ. M.van den BoogaardH. F. P.van GentM. R. A. (2020). Prediction of mean wave overtopping discharge using gradient boosting decision trees. Water12. 10.3390/W12061703
43
DengY.YangJ.ZhaoW.LiX.XiaoL. (2016). Freak wave forces on a vertical cylinder. Coast. Eng.114, 9–18. 10.1016/j.coastaleng.2016.03.007
44
DeroliyaP.GhoshM.MohantyM. P.GhoshS.RaoK. D.KarmakarS. (2022). A novel flood risk mapping approach with machine learning considering geomorphic and socio-economic vulnerability dimensions. Sci. Total Environ.851, 158002. 10.1016/j.scitotenv.2022.158002
45
DezvarehR.ShafaghatM. (2020). Predicting the sediment rate of Nakhilo Port using artificial intelligence. Int. J. Coast. offshore Eng.4 (2), 41–49. 10.22034/IJCOE.2020.149345
46
DiZ.ChangM.GuoP.LiY.ChangY. (2019). Using real-time data and unsupervised machine learning techniques to study large-scale spatio-temporal characteristics of wastewater discharges and their influence on surface water quality in the Yangtze River Basin. WaterSwitzerl.11, 1268. 10.3390/w11061268
47
DiesingM.StephensD. (2015). A multi-model ensemble approach to seabed mapping. J. Sea Res.100, 62–69. 10.1016/j.seares.2014.10.013
48
DoganG.FordM.JamesS. (2021). “Predicting ocean-wave conditions using buoy data supplied to a hybrid RNN-LSTM neural network and machine learning models,” in Proceedings of the 2021 IEEE International Conference on Machine Learning and Applied Network Technologies, Soyapango, El Salvador, December 16–17, 2021 (IEEE (Institute of Electrical and Electronics Engineers)). ICMLANT 2021. 10.1109/ICMLANT53170.2021.9690528
49
DonnellyJ.AbolfathiS.PearsonJ.ChatrabgounO.DaneshkhahA. (2022). Gaussian process emulation of spatio-temporal outputs of a 2D inland flood model. Water Res.225, 119100. 10.1016/j.watres.2022.119100
50
DuongN. T.TranK. Q.LuuL. X.TranL. H. (2023). Prediction of breaking wave height by using artificial neural network-based approach. Ocean. Model.182, 102177. 10.1016/J.OCEMOD.2023.102177
51
El-HaddadB. A.YoussefA. M.PourghasemiH. R.PradhanB.El-ShaterA. H.El-KhashabM. H. (2021). Flood susceptibility prediction using four machine learning techniques and comparison of their performance at Wadi Qena Basin, Egypt. Nat. Hazards105, 83–114. 10.1007/s11069-020-04296-y
52
El-RahmanS. A. (2016). “Hyperspectral imaging classification using ISODATA algorithm: big data challenge”. in Proceedings - 2015 5th International Conference on e-Learning, Manama, Bahrain, October 18–20, 2015 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/ECONF.2015.39
53
ElsayedS.IbrahimH.HusseinH.ElsherbinyO.ElmetwalliA. H.MoghanmF. S.et al (2021). Assessment of water quality in lake qaroun using ground-based remote sensing data and artificial neural networks. Water13, 3094. 10.3390/w13213094
54
EmmanouilS.AguilarS. G.NaneG. F.SchoutenJ. J. (2020). Statistical models for improving significant wave height predictions in offshore operations. Ocean. Eng.206, 107249. 10.1016/j.oceaneng.2020.107249
55
EnnoualiZ.FannassiY.LahssiniG.BenmohammadiA.MasriaA. (2023). Mapping coastal vulnerability using machine learning algorithms: a case study at north coastline of sebou estuary, Morocco. Regional Stud. Mar. Sci.60, 102829. 10.1016/J.RSMA.2023.102829
56
EsterM.KriegelH.-P.SanderJ.XuX. (1996). “A density-based algorithm for discovering clusters in large spatial databases with Noise,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.
57
EwuzieU.AkuN. O.NwankpaS. U. (2021). An appraisal of data collection, analysis, and reporting adopted for water quality assessment: a case of Nigeria water quality research. Heliyon7, e07950. 10.1016/J.HELIYON.2021.E07950
58
EyringV.BonyS.MeehlG. A.SeniorC. A.StevensB.StoufferR. J.et al (2016). Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization. Geosci. Model. Dev.9, 1937–1958. 10.5194/gmd-9-1937-2016
59
FanS.XiaoN.DongS. (2020). A novel model to predict significant wave height based on long short-term memory network. Ocean. Eng.205, 107298. 10.1016/J.OCEANENG.2020.107298
60
FernándezJ. C.Salcedo-SanzS.GutiérrezP. A.AlexandreE.Hervás-MartínezC. (2015). Significant wave height and energy flux range forecast with machine learning classifiers. Eng. Appl. Artif. Intell.43, 44–53. 10.1016/J.ENGAPPAI.2015.03.012
61
FormentinS. M.ZanuttighB.Van Der MeerJ. W. (2017). A neural network tool for predicting wave reflection, overtopping and transmission. Coast. Eng. J.59, 1750006-1–1750006-31. 10.1142/S0578563417500061
62
FreemanB.TangY.HuangY.VanZwietenJ. (2021). Rotor blade imbalance fault detection for variable-speed marine current turbines via generator power signal analysis. Ocean. Eng.223, 108666. 10.1016/j.oceaneng.2021.108666
63
GandomiM.Dolatshahi PiroozM.VarjavandI.NikooM. R. (2020). Permeable breakwaters performance modeling: a comparative study of machine learning techniques. Remote Sens.12, 1856. 10.3390/rs12111856
64
GauciA.DeidunA.AbelaJ.Zarb AdamiK. (2016). Machine Learning for benthic sand and maerl classification and coverage estimation in coastal areas around the Maltese Islands. J. Appl. Res. Technol.14, 338–344. 10.1016/j.jart.2016.08.003
65
GawehnM.van DongerenA.de VriesS.SwinkelsC.HoekstraR.AarninkhofS.et al (2020). The application of a radar-based depth inversion method to monitor near-shore nourishments on an open sandy coast and an ebb-tidal delta. Coast. Eng.159, 103716. 10.1016/J.COASTALENG.2020.103716
66
GoldsteinE. B.CocoG.PlantN. G. (2019). A review of machine learning applications to coastal sediment transport and morphodynamics. Earth-Science Rev.194, 97–108. 10.1016/j.earscirev.2019.04.022
67
GomezC. (2022). “Point-cloud technology for coastal and floodplain geomorphology,” in Point cloud technologies for geomorphologists from data acquisition to processing (Springer), 53–81. 10.1007/978-3-031-10975-1
68
GünaydinK. (2008). The estimation of monthly mean significant wave heights by using artificial neural network and regression methods. Ocean. Eng.35, 1406–1415. 10.1016/J.OCEANENG.2008.07.008
69
HagenaarsG.de VriesS.LuijendijkA. P.de BoerW. P.ReniersA. J. H. M. (2018). On the accuracy of automated shoreline detection derived from satellite imagery: a case study of the sand motor mega-scale nourishment. Coast. Eng.133, 113–125. 10.1016/J.COASTALENG.2017.12.011
70
HallJ. W.MeadowcroftI. C.LeeE. M.Van GelderP. H. A. J. M. (2002). Stochastic simulation of episodic soft coastal cliff recession. Coast. Eng.46, 159–174. 10.1016/S0378-3839(02)00089-3
71
HashemiM. R.GhadampourZ.NeillS. P. (2010). Using an artificial neural network to model seasonal changes in beach profiles. Ocean. Eng.37, 1345–1356. 10.1016/J.OCEANENG.2010.07.004
72
HessamiM.GachonP.OuardaT. B. M. J.St-HilaireA. (2008). Automated regression-based statistical downscaling tool. Environ. Model. Softw.23, 813–834. 10.1016/J.ENVSOFT.2007.10.004
73
HeumannB. W. (2011). An object-based classification of mangroves using a hybrid decision tree-support vector machine approach. Remote Sens.3, 2440–2460. 10.3390/rs3112440
74
HodgeV. J.AustinJ. (2004). A survey of outlier detection methodologies. Artif. Intell. Rev.22, 85–126. 10.1023/b:aire.0000045502.10941.a9
75
HongJ.-H.ChiewY.-M.ChengN.-S. (2013). Scour caused by a propeller jet. J. Hydraul. Eng.139, 1003–1012. 10.1061/(asce)hy.1943-7900.0000746
76
HoonhoutB. M.RadermacherM.BaartF.van der MaatenL. J. P. (2015). An automated method for semantic classification of regions in coastal images. Coast. Eng.105, 1–12. 10.1016/j.coastaleng.2015.07.010
77
HosseinzadehS.Etemad-ShahidiA.KooshehA. (2021). Prediction of mean wave overtopping at simple sloped breakwaters using kernel-based methods. J. Hydroinformatics23, 1030–1049. 10.2166/hydro.2021.046
78
HotellingH. (1933). Analysis of a complex of statistical variables into principal components. J. Educ. Psychol.24, 417–441. 10.1037/h0071325
79
HuaX. G.NiY. Q.KoJ. M.WongK. Y. (2007). Modeling of temperature–frequency correlation using combined principal component analysis and support vector regression technique. J. Comput. Civ. Eng.21, 122–135. 10.1061/(asce)0887-3801(2007)21:2(122)
80
HuangD.ZhaoD.WeiL.WangZ.DuY. (2015). Modeling and analysis in marine big data: advances and challenges. Math. Problems Eng.2015, 1–13. 10.1155/2015/384742
81
IrishJ. L.WhiteT. E. (1998). Coastal engineering applications of high-resolution lidar bathymetry. Coast. Eng.35, 47–71. 10.1016/S0378-3839(98)00022-2
82
IzadiM.SultanM.KadiriR. E.GhannadiA.AbdelmohsenK. (2021). A remote sensing and machine learning-based approach to forecast the onset of harmful algal bloom. Remote Sens.13, 3863. 10.3390/rs13193863
83
JainP.DeoM. C. (2008). Artificial intelligence tools to forecast ocean waves in real time. Open Ocean Eng. J.1, 13–20. 10.2174/1874835x00801010013
84
JamesS. C.ZhangY.O’DonnchaF. (2018). A machine learning framework to forecast wave conditions. Coast. Eng.137, 1–10. 10.1016/j.coastaleng.2018.03.004
85
JayaratneM. P. R.PremaratneB.AdewaleA.MikamiT.MatsubaS.ShibayamaT.et al (2016). Failure mechanisms and local scour at coastal structures induced by Tsunami. Coast. Eng. J.58, 1640017-1–1640017-38. 10.1142/S0578563416400179
86
JiangT.GradusJ. L.RoselliniA. J. (2020). Supervised machine learning: a brief primer. Behav. Ther.51, 675–687. 10.1016/J.BETH.2020.05.002
87
JirakittayakornA.KormongkolkulT.VateekulP.JitkajornwanichK.LawawirojwongS. (2017). “Temporal kNN for short-Term ocean current prediction based on HF radar observations,” in Proceedings of the 2017 14th International Joint Conference on Computer Science and Software Engineering, NakhonSiThammarat, Thailand, July 12–14, 2017 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/JCSSE.2017.8025921
88
JoyceK. E.FickasK. C.KalamandeenM. (2023). The unique value proposition for using drones to map coastal ecosystems. Camb. Prisms Coast. Futur.1, e6. 10.1017/cft.2022.7
89
Kabiri-SamaniA. R.Aghaee-TarazjaniJ.BorgheiS. M.JengD. S. (2011). Application of neural networks and fuzzy logic models to long-shore sediment transport. Appl. Soft Comput.11, 2880–2887. 10.1016/J.ASOC.2010.11.021
90
KagemotoH. (2020). Forecasting a water-surface wave train with artificial intelligence- A case study. Ocean. Eng.207, 107380. 10.1016/J.OCEANENG.2020.107380
91
KalkanK.BayramB.MaktavD.SunarF. (2013). “Comparison of support vector machine and object based classification methods for coastline detection,” in International archives of the photogrammetry, remote sensing and spatial information sciences - ISPRS archives. 10.5194/isprsarchives-XL-7-W2-125-2013
92
KaloopM. R.KumarD.ZarzouraF.RoyB.HuJ. W. (2020). A wavelet - particle swarm optimization - extreme learning machine hybrid modeling for significant wave height prediction. Ocean. Eng.213, 107777. 10.1016/J.OCEANENG.2020.107777
93
KaplanD.Muñoz-CarpenaR.RitterA. (2010). Untangling complex shallow groundwater dynamics in the floodplain wetlands of a southeastern U.S. coastal river. Water Resour. Res.46. 10.1029/2009WR009038
94
KarimpourA.ChenQ.TwilleyR. R. (2016). A field study of how wind waves and currents may contribute to the deterioration of saltmarsh fringe. Estuaries Coasts39, 935–950. 10.1007/s12237-015-0047-z
95
KartalS. (2023). Assessment of the spatiotemporal prediction capabilities of machine learning algorithms on Sea Surface temperature data: a comprehensive study. Eng. Appl. Artif. Intell.118, 105675. 10.1016/J.ENGAPPAI.2022.105675
96
KelleherJ. D.NameeB. M.D’ArcyA. (2015). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies.
97
KimH. D.AokiS. I. (2021). Artificial intelligence application on sediment transport. J. Mar. Sci. Eng.9, 600. 10.3390/jmse9060600
98
KimJ.KimJ. (2020). Estimation of water surface flow velocity in coastal video imagery by visual tracking with deep learning. J. Coast. Res.95, 522. 10.2112/SI95-101.1
99
KimJ.KimJ.KimT.HuhD.CairesS. (2020). Wave-tracking in the surf zone using coastal video imagery with deep neural networks. Atmos. (Basel)11, 304. 10.3390/atmos11030304
100
KimT.KwonY.LeeJ.LeeE.KwonS. (2022). Wave attenuation prediction of artificial coral reef using machine-learning integrated with hydraulic experiment. Ocean. Eng.248, 110324. 10.1016/J.OCEANENG.2021.110324
101
KitsikoudisV.SidiropoulosE.HrissanthouV. (2015). Assessment of sediment transport approaches for sand-bed rivers by means of machine learning. Hydrological Sci. J.60, 1566–1586. 10.1080/02626667.2014.909599
102
KnightP. J.BirdC. O.SinclairA.PlaterA. J. (2020). A low-cost GNSS buoy platform for measuring coastal sea levels. Ocean. Eng.203, 107198. 10.1016/J.OCEANENG.2020.107198
103
KongX.SunY.SuR.ShiX. (2017). Real-time eutrophication status evaluation of coastal waters using support vector machine with grid search algorithm. Mar. Pollut. Bull.119, 307–319. 10.1016/J.MARPOLBUL.2017.04.022
104
KramerO. (2013). Dimensionality reduction with unsupervised nearest neighbors. Intell. Syst. Ref. Libr.51. 10.1007/978-3-642-38652-7
105
KroonA.LarsonM.MöllerI.YokokiH.RozynskiG.CoxJ.et al (2008). Statistical analysis of coastal morphological data sets over seasonal to decadal time scales. Coast. Eng.55, 581–600. 10.1016/j.coastaleng.2007.11.006
106
KumarN. K.SavithaR.MamunA. A. (2017). Regional ocean wave height prediction using sequential learning neural networks. Ocean. Eng.129, 605–612. 10.1016/J.OCEANENG.2016.10.033
107
KumarN. K.SavithaR.MamunA. A. (2018). Ocean wave height prediction using ensemble of Extreme Learning Machine. Ocean. Eng.277, 605–612. 10.1016/J.NEUCOM.2017.03.092
108
KuntojiG.RaoM.RaoS. (2020). Prediction of wave transmission over submerged reef of tandem breakwater using PSO-SVM and PSO-ANN techniques. ISH J. Hydraulic Eng.26, 283–290. 10.1080/09715010.2018.1482796
109
KuoY. M.LiuW.ZhaoE.LiR.Muñoz-CarpenaR. (2019). Water quality variability in the middle and down streams of Han River under the influence of the Middle Route of South-North Water diversion project, China. J. Hydrology569, 218–229. 10.1016/j.jhydrol.2018.12.001
110
LatifS. D.ChongK. L.AhmedA. N.HuangY. F.SherifM.El-ShafieA. (2023). Sediment load prediction in johor river: deep learning versus machine learning models. Appl. Water Sci.13, 79–13. 10.1007/s13201-023-01874-w
111
LazuardiW.ArdiyantoR.MarfaiM. A.MutaqinB. W.KusumaD. W. (2021). Coastal reef and seagrass monitoring for coastal ecosystem management. Int. J. Sustain. Dev. Plan.16, 557–568. 10.18280/IJSDP.160317
112
LeeT.-L. (2004). Back-propagation neural network for long-term tidal predictions. Ocean. Eng.31, 225–238. 10.1016/S0029-8018(03)00115-X
113
LiS. P.WangH. L. (2011). “Control stratory in coastal area using Markov chain and Random Forest,” in 2011 IEEE 18th International Conference on Industrial Engineering and Engineering Management, IE and EM 2011, Changchun, China, September 3–5, 2011 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/ICIEEM.2011.6035480
114
LiuB.YangB.Masoud-AnsariS.WangH.GaheganM. (2021). Coastal image classification and pattern recognition: Tairua beach, New Zealand. Sensors21, 7352. 10.3390/s21217352
115
LouR.LvZ.DangS.SuT.LiX. (2021). “Application of machine learning in ocean data,” in Multimedia systems. 10.1007/s00530-020-00733-x
116
MacayealD. R.AbbotD. S.SergienkoO. V. (2011). Iceberg-capsize tsunamigenesis. Ann. Glaciol.52, 51–56. 10.3189/172756411797252103
117
MahjoobiJ.Adeli MosabbebE. (2009). Prediction of significant wave height using regressive support vector machines. Ocean. Eng.36, 339–347. 10.1016/J.OCEANENG.2009.01.001
118
MahjoobiJ.Etemad-ShahidiA. (2008). An alternative approach for the prediction of significant wave heights based on classification and regression trees. Appl. Ocean Res.30, 172–177. 10.1016/J.APOR.2008.11.001
119
MahmoodiK.GhassemiH. (2018). Outlier detection in ocean wave measurements by using unsupervised data mining methods. Pol. Marit. Res.25, 44–50. 10.2478/pomr-2018-0005
120
MakarynskyyO.Pires-SilvaA. A.MakarynskaD.Ventura-SoaresC. (2005). Artificial neural networks in wave predictions at the west coast of Portugal. Comput. Geosciences31, 415–424. 10.1016/J.CAGEO.2004.10.005
121
MartinsG. M.ThompsonR. C.NetoA. I.HawkinsS. J.JenkinsS. R. (2010). Enhancing stocks of the exploited limpet Patella candei d’Orbigny via modifications in coastal engineering. Biol. Conserv.143, 203–211. 10.1016/j.biocon.2009.10.004
122
MasmoudiO.JaouaM.JaouaA.YacoutS. (2021). Data preparation in machine learning for condition-based maintenance. J. Comput. Sci.17, 525–538. 10.3844/JCSSP.2021.525.538
123
MengF.SongT.XuD.XieP.LiY. (2021). Forecasting tropical cyclones wave height using bidirectional gated recurrent unit. Ocean. Eng.234, 108795. 10.1016/J.OCEANENG.2021.108795
124
MészárosL.van der MeulenF.JongbloedG.El SerafyG. (2022). Coastal environmental and atmospheric data reduction in the Southern North Sea supporting ecological impact studies. Front. Mar. Sci.9, 1–23. 10.3389/fmars.2022.920616
125
MillerJ. K.DeanR. G. (2007). Shoreline variability via empirical orthogonal function analysis: part II relationship to nearshore conditions. Coast. Eng.54, 133–150. 10.1016/j.coastaleng.2006.08.014
126
MoncadaA. M.MelesseA. M.VithanageJ.PriceR. M. (2021). Long-term assessment of surface water quality in a highly managed estuary basin. Int. J. Environ. Res. Public Health18, 9417. 10.3390/ijerph18179417
127
MoodyD. I.BrumbyS. P.RowlandJ. C.AltmannG. L. (2014). Land cover classification in multispectral imagery using clustering of sparse approximations over learned feature dictionaries. J. Appl. Remote Sens.8, 084793. 10.1117/1.jrs.8.084793
128
NajafiM. R.MoradkhaniH.WherryS. A. (2011). Statistical downscaling of precipitation using machine learning with optimal predictor selection. J. Hydrol. Eng.16, 650–664. 10.1061/(asce)he.1943-5584.0000355
129
NakamuraT.KuramitsuY.MizutaniN. (2008). Tsunami scour around a square structure. Coast. Eng. J.50, 209–246. 10.1142/S057856340800179X
130
NeshatM.AbbasnejadE.ShiQ.AlexanderB.WagnerM. (2019). “Adaptive neuro-surrogate-based optimisation method for wave energy converters placement optimisation,” in Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics. 10.1007/978-3-030-36711-4_30
131
NeumannB.OttK.KenchingtonR. (2017). Strong sustainability in coastal areas: a conceptual interpretation of SDG 14. Sustain. Sci.12, 1019–1035. 10.1007/s11625-017-0472-y
132
NikooM. R.KerachianR.AlizadehM. R. (2018). A fuzzy KNN-based model for significant wave height prediction in large lakes. Oceanologia60, 153–168. 10.1016/J.OCEANO.2017.09.003
133
OehmckeS.ZielinskiO.KramerO. (2015). “Event detection in marine time series data,” in Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics. 10.1007/978-3-319-24489-1_24
134
PalM.MatherP. M. (2003). An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens. Environ.86, 554–565. 10.1016/S0034-4257(03)00132-9
135
ParadinasL. M.JamesN. A.QuinnB.DaleA.NarayanaswamyB. E. (2021). A new collection tool-kit to sample microplastics from the marine environment (sediment, seawater, and biota) using citizen science. Front. Mar. Sci.8. 10.3389/fmars.2021.657709
136
ParkJ.OhJ. (2022). Analysis of collected data and establishment of an abnormal data detection algorithm using principal component analysis and K-nearest neighbors for predictive maintenance of ship propulsion engine. Processes10, 2392. 10.3390/pr10112392
137
ParkS. J.LeeD. K. (2020). Prediction of coastal flooding risk under climate change impacts in South Korea using machine learning algorithms. Environ. Res. Lett.15, 094052. 10.1088/1748-9326/aba5b3
138
PearsonK. (1901)., 2. London, Edinburgh, 559–572. 10.1080/14786440109462720LIII. On lines and planes of closest fit to systems of points in spaceLond. Edinb. Dublin Philosophical Mag. J. Sci.
139
PeñaE.FerrerasJ.Sanchez-TemblequeF. (2011). Experimental study on wave transmission coefficient, mooring lines and module connector forces with different designs of floating breakwaters. Ocean. Eng.38, 1150–1160. 10.1016/j.oceaneng.2011.05.005
140
PereiraG. C.EbeckenN. F. F. (2009). Knowledge discovering for coastal waters classification. Expert Syst. Appl.36, 8604–8609. 10.1016/J.ESWA.2008.10.009
141
PimentelM. A. F.CliftonD. A.CliftonL.TarassenkoL. (2014). A review of novelty detection. Signal Process.99, 215–249. 10.1016/j.sigpro.2013.12.026
142
PlaatA.KostersW.PreussM. (2023). High-accuracy model-based reinforcement learning, a survey. Artif. Intell. Rev.56, 9541–9573. 10.1007/s10462-022-10335-w
143
PourghasemiH. R.GayenA.ParkS.LeeC. W.LeeS. (2018). Assessment of landslide-prone areas and their zonation using logistic regression, LogitBoost, and naïvebayes machine-learning algorithms. Sustainability10, 3697. 10.3390/su10103697
144
PourzangbarA.BrocchiniM. (2022). A new process-based, wave-resolving, 2DH circulation model for the evolution of natural sand bars: the role of nearbed dynamics and suspended sediment transport. Coast. Eng.177, 104192. 10.1016/J.COASTALENG.2022.104192
145
PourzangbarA.BrocchiniM.SaberA.MahjoobiJ.MirzaaghasiM.BarzegarM. (2017a). Prediction of scour depth at breakwaters due to non-breaking waves using machine learning approaches. Appl. Ocean Res.63, 120–128. 10.1016/j.apor.2017.01.012
146
PourzangbarA. (2012). Determination of the most effective parameters on scour depth at seawalls using genetic programming (GP). 10th Int. Conf. coasts, ports Mar. Struct. (ICOPMASS 2012).
147
PourzangbarA.LosadaM. A.SaberA.AhariL. R.LarroudéP.VaeziM.et al (2017b). Prediction of non-breaking wave induced scour depth at the trunk section of breakwaters using Genetic Programming and Artificial Neural Networks. Coast. Eng.121, 107–118. 10.1016/j.coastaleng.2016.12.008
148
PourzangbarA.SaberA.Yeganeh-BakhtiaryA.AhariL. R. (2017c). Predicting scour depth at seawalls using GP and ANNs. J. Hydroinformatics19, 349–363. 10.2166/hydro.2017.125
149
PrataJ. C.da CostaJ. P.DuarteA. C.Rocha-SantosT. (2019). Methods for sampling and detection of microplastics in water and sediment: a critical review. TrAC Trends Anal. Chem.110, 150–159. 10.1016/J.TRAC.2018.10.029
150
ProvostE. J.ButcherP. A.ColemanM. A.KelaherB. P. (2020). Assessing the viability of small aerial drones to quantify recreational Fishers. Fish. Manag. Ecol.27, 615–621. 10.1111/fme.12452
151
QiaoX.ChuT.TissotP.AliI.AhmedM. (2023). Vertical land motion monitored with satellite radar altimetry and tide gauge along the Texas coastline, USA, between 1993 and 2020. Int. J. Appl. Earth Observation Geoinformation117, 103222. 10.1016/J.JAG.2023.103222
152
RamaswamyS.RastogiR.ShimK. (2000). Efficient algorithms for mining outliers from large data sets. SIGMOD Rec. (ACM Spec. Interes. Gr. Manag. Data)29, 427–438. 10.1145/335191.335437
153
RanasingheR.LarsonM.SavioliJ. (2010). Shoreline response to a single shore-parallel submerged breakwater. Coast. Eng.57, 1006–1017. 10.1016/j.coastaleng.2010.06.002
154
RaoS.MandalS. (2005). Hindcasting of storm waves using neural networks. Ocean. Eng.32, 667–684. 10.1016/J.OCEANENG.2004.09.003
155
ReggianniniM.PapiniO.PieriG. (2022). An automated analysis tool for the classification of Sea surface temperature imagery. Pattern Recognit. Image Anal.32, 631–635. 10.1134/S1054661822030336
156
RengarajanD.VaidyaG.SarveshA.KalathilD.ShakkottaiS. (2022). “Reinforcement learning with sparse rewards using guidance from offline demonstration,” in ICLR 2022 - 10th International Conference on Learning Representations, Virtual, April 25–29, 2022.
157
RizianizaI.AisjahA. S. (2015). “Prediction of significant wave height in the java sea using artificial neural network,” in Proceeding 2015 International Seminar on Intelligent Technology and Its Applications, ISITIA, Surabaya, Indonesia, May 20–21, 2015 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/ISITIA.2015.7219944
158
RokniK.AhmadA.SolaimaniK.HaziniS. (2015). A new approach for surface water change detection: integration of pixel level image fusion and image classification techniques. Int. J. Appl. Earth Observation Geoinformation34, 226–234. 10.1016/J.JAG.2014.08.014
159
Ruiz de Alegría-ArzaburuA.Pedrozo-AcuñaA.Horrillo-CaraballoJ. M.MasselinkG.ReeveD. E. (2010). Determination of wave-shoreline dynamics on a macrotidal gravel beach using Canonical Correlation Analysis. Coast. Eng.57, 290–303. 10.1016/j.coastaleng.2009.10.014
160
SajjadM.LinN.ChanJ. C. L. (2020). Spatial heterogeneities of current and future hurricane flood risk along the U.S. Atlantic and Gulf coasts. Sci. Total Environ.713, 136704. 10.1016/j.scitotenv.2020.136704
161
SakaaB.ElbeltagiA.BoudibiS.ChaffaïH.IslamA. R. M. T.KulimushiL. C.et al (2022). Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin. Environ. Sci. Pollut. Res.29, 48491–48508. 10.1007/s11356-022-18644-x
162
Santamaria CervantesM.Díaz-CarrascoP.MoraguesM. V.ClaveroM.LosadaángelM. (2022). “Uncertainties of the actual engineering formulas for coastal protection slopes. The dimensional analysis and experimental method,” in Proceedings of the 39th IAHR World Congress From Snow to Sea, Granada, Spain, June 19–24, 2022 (IAHR (International Association for Hydro-Environment Engineering and Research)). 10.3850/iahr-39wc252171192022900
163
SarkarD.ContalE.VayatisN.DiasF. (2016). Prediction and optimization of wave energy converter arrays using a machine learning approach. Renew. Energy97, 504–517. 10.1016/j.renene.2016.05.083
164
SarkarS.GundechaV.GhorbanpourS.ShmakovA.BabuA. R.PichardA.et al (2022). “Skip training for multi-agent reinforcement learning controller for industrial wave energy converters,” in 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico, August 20–24, 2022 (IEEE (Institute of Electrical and Electronics Engineers)), 212–219. 10.1109/CASE49997.2022.9926561
165
ScottT.MasselinkG.RussellP. (2011). Morphodynamic characteristics and classification of beaches in England and Wales. Mar. Geol.286, 1–20. 10.1016/j.margeo.2011.04.004
166
SekovskiI.StecchiF.ManciniF.Del RioL. (2014). Image classification methods applied to shoreline extraction on very high-resolution multispectral imagery. Int. J. Remote Sens.35, 3556–3578. 10.1080/01431161.2014.907939
167
ShafaghatM.DezvarehR. (2021). Support vector machine for classification and regression of coastal sediment transport. Arab. J. Geosci.14, 2009. 10.1007/s12517-021-08360-0
168
ShamshirbandS.MosaviA.RabczukT.NabipourN.ChauK. w. (2020). Prediction of significant wave height; comparison between nested grid numerical model, and machine learning models of artificial neural networks, extreme learning and support vector machines. Eng. Appl. Comput. Fluid Mech.14, 805–817. 10.1080/19942060.2020.1773932
169
ShenbagarajN.ManiN. D.MuthukumarM. (2014). Isodata classification technique to assess the shoreline changes of Kolachel to Kayalpattanam coast. Int. J. Eng. Res. Technol.3. 10.17577/IJERTV3IS040136
170
ShiF.KirbyJ. T.HarrisJ. C.GeimanJ. D.GrilliS. T. (2012). A high-order adaptive time-stepping TVD solver for Boussinesq modeling of breaking waves and coastal inundation. Ocean. Model.43-44, 36–51. 10.1016/J.OCEMOD.2011.12.004
171
ShuiP. L.XiaX. Y.ZhangY. S. (2020). Sea-land segmentation in maritime surveillance radars via k-nearest neighbor classifier. IEEE Trans. Aerosp. Electron. Syst.56, 3854–3867. 10.1109/TAES.2020.2981267
172
ShuvoS. S.YilmazY.BushA.HafenM. (2022). Modeling and simulating adaptation strategies against sea-level rise using multiagent deep reinforcement learning. IEEE Trans. Comput. Soc. Syst.9, 1185–1196. 10.1109/TCSS.2021.3122282
173
SierraC.Flor-BlancoG.OrdoñezC.FlorG.GallegoJ. R. (2017). Analyzing coastal environments by means of functional data analysis. Sediment. Geol.357, 99–108. 10.1016/j.sedgeo.2017.06.008
174
SmitM. W. J.AarninkhofS. G. J.WijnbergK. M.GonzálezM.KingstonK. S.SouthgateH. N.et al (2007). The role of video imagery in predicting daily to monthly coastal evolution. Coast. Eng.54, 539–553. 10.1016/J.COASTALENG.2007.01.009
175
SoloyA.TurkiI.LecoqN.Gutiérrez BarcelóÁ. D.CostaS.LaignelB.et al (2021). A fully automated method for monitoring the intertidal topography using Video Monitoring Systems. Coast. Eng.167, 103894. 10.1016/J.COASTALENG.2021.103894
176
SzmytkiewiczM.BiegowskiJ. X.KaczmarekL. M.OkrójT.OstrowskiR. X.PruszakZ.et al (2000). Coastline changes nearby harbour structures: comparative analysis of one-line models versus field data. Coast. Eng.40, 119–139. 10.1016/S0378-3839(00)00008-9
177
TanJ.ChenS.LeeC. Y.DongG.HuW.WangJ. (2021). Projected changes of typhoon intensity in a regional climate model: development of a machine learning bias correction scheme. Int. J. Climatol.41, 2749–2764. 10.1002/joc.6987
178
TanJ.LiuH.LiM.WangJ. (2018). A prediction scheme of tropical cyclone frequency based on lasso and random forest. Theor. Appl. Climatol.133, 973–983. 10.1007/s00704-017-2233-3
179
TayfurG.KarimiY.SinghV. P. (2013). Principle component analysis in conjuction with data driven methods for sediment load prediction. Water Resour. Manag.27, 2541–2554. 10.1007/s11269-013-0302-7
180
TimmermansB. W.GommengingerC. P.DodetG.BidlotJ. R. (2020). Global wave height trends and variability from new multimission satellite altimeter products, reanalyses, and wave buoys. Geophys. Res. Lett.47. 10.1029/2019GL086880
181
TsiakosC. A. D.ChalkiasC. (2023). Use of machine learning and remote sensing techniques for shoreline monitoring: a review of recent literature. Appl. Sci.13, 3268. 10.3390/app13053268
182
TsujimotoG.TamaiM.YamadaF. (2012). LONG-TERM prediction of beach profile and sediment grain size characteristic at low energy beach. Coast. Eng. Proc.1, 14. 10.9753/icce.v33.sediment.14
183
TurnerI. L.HarleyM. D.AlmarR.BergsmaE. W. J. (2021). Satellite optical imagery in coastal engineering. Coast. Eng.167, 103919. 10.1016/J.COASTALENG.2021.103919
184
UddinM. G.NashS.RahmanA.OlbertA. I. (2023). Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Saf. Environ. Prot.169, 808–828. 10.1016/J.PSEP.2022.11.073
185
UhlF.Græsdal RasmussenT.OppeltN. (2021). Classification ensembles for beach cast and drifting vegetation mapping with sentinel-2 and PlanetScope. Geosciences12, 15. 10.3390/geosciences12010015
186
van GentM. R. A.van den BoogaardH. F. P.PozuetaB.MedinaJ. R. (2007). Neural network modelling of wave overtopping at coastal structures. Coast. Eng.54, 586–593. 10.1016/j.coastaleng.2006.12.001
187
Van KomenD. F.HowarthK.NeilsenT. B.KnoblesD. P.DahlP. H. (2022). A CNN for range and seabed estimation on normalized and extracted time-series impulses. IEEE J. Ocean. Eng.47, 833–846. 10.1109/JOE.2021.3134719
188
VaralakshmiP.VasumathiN.VenkatesanR. (2021). Tropical Cyclone prediction based on multi-model fusion across Indian coastal region. Prog. Oceanogr.193, 102557. 10.1016/j.pocean.2021.102557
189
VerwegaM. T.TrahmsC.AntiaA. N.DickhausT.PriggeE.PrinzlerM. H. U.et al (2021). Perspectives on marine data science as a blueprint for emerging data science disciplines. Front. Mar. Sci.8. 10.3389/fmars.2021.678404
190
VosK.SplinterK. D.HarleyM. D.SimmonsJ. A.TurnerI. L. (2019). CoastSat: a Google Earth engine-enabled Python toolkit to extract shorelines from publicly available satellite imagery. Environ. Model. Softw.122, 104528. 10.1016/J.ENVSOFT.2019.104528
191
WattelezG.DupouyC.JuillotF. (2022). Unsupervised optical classification of the seabed color in shallow oligotrophic waters from sentinel‐2 images: a case study in the voh‐koné‐pouembout lagoon (New Caledonia). Remote Sens.14, 836. 10.3390/rs14040836
192
WongY. J.ShimizuY.KamiyaA.ManeechotL.BharambeK. P.FongC. S.et al (2021). Application of artificial intelligence methods for monsoonal river classification in Selangor river basin, Malaysia. Environ. Monit. Assess.193, 438. 10.1007/s10661-021-09202-y
193
XieP.ArkinP. A. (1996). Analyses of global monthly precipitation using gauge observations, satellite estimates, and numerical model predictions. J. Clim.9. 10.1175/1520-0442(1996)009<0840:AOGMPU>2.0.CO;2
194
XuL.LiQ.YuJ.WangL.XieJ.ShiS. (2020). Spatio-temporal predictions of SST time series in China’s offshore waters using a regional convolution long short-term memory (RC-LSTM) network. Int. J. Remote Sens.41, 3368–3389. 10.1080/01431161.2019.1701724
195
XuX.ZhanY.ZhengJ.GengB. (2021). “Classification of coastal altimetric waveforms using machine learning technology,” in 2021 4th International Conference on Information Communication and Signal Processing, ICICSP 2021, Shanghai, China, September 24–26, 2021 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/ICICSP54369.2021.9611971
196
YaoH.YangY.FuX.MiC. (2018). An adaptive sliding-window strategy for outlier detection in wireless sensor networks for smart port construction. J. Coast. Res.82, 245–253. 10.2112/SI82-036.1
197
Yeganeh-BakhtiaryA.EyvazoghliH.ShabakhtyN.KamranzadB.AbolfathiS. (2022). Machine learning as a downscaling approach for prediction of wind characteristics under future climate change scenarios. Complexity2022. 10.1155/2022/8451812
198
Yeganeh-bakhtiaryA.GhorbaniM. A.PourzangbarA. (2012). Determination of the most important parameters on scour at coastal determination of the most important parameters on scour at coastal structures. J. Civ. Eng. Urbanism2, 68–71.
199
YuL.SunJ.GuoY.ZhangB.YangG.ChenL.et al (2022). Research on outlier detection in CTD conductivity data based on cubic spline fitting. Front. Mar. Sci.9. 10.3389/fmars.2022.1030980
200
ZanuttighB.FormentinS. M.van der MeerJ. W. (2016). Prediction of extreme and tolerable wave overtopping discharges through an advanced neural network. Ocean. Eng.127, 7–22. 10.1016/J.OCEANENG.2016.09.032
201
Zelada LeonA.HuvenneV. A. I.BenoistN. M. A.FergusonM.BettB. J.WynnR. B. (2020). Assessing the repeatability of automated seafloor classification algorithms, with application in marine protected area monitoring. Remote Sens.12, 1572. 10.3390/rs12101572
202
ZhuangX.LiW.XuY. (2022). Port planning and sustainable development based on prediction modelling of port throughput: a case study of the deep-water dongjiakou port. Sustainability14, 4276. 10.3390/su14074276
203
ZimekA.SchubertE.KriegelH. P. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Analysis Data Min.5, 363–387. 10.1002/sam.11161
204
ZouS.ZhouX.KhanI.WeaverW. W.RahmanS. (2022). Optimization of the electricity generation of a wave energy converter using deep reinforcement learning. Ocean. Eng.244, 110363. 10.1016/J.OCEANENG.2021.110363
Summary
Keywords
machine learning, maritime modelling, classification, prediction, critical review
Citation
Pourzangbar A, Jalali M and Brocchini M (2023) Machine learning application in modelling marine and coastal phenomena: a critical review. Front. Environ. Eng. 2:1235557. doi: 10.3389/fenve.2023.1235557
Received
06 June 2023
Accepted
17 August 2023
Published
11 September 2023
Volume
2 - 2023
Edited by
Jan Hofman, University of Bath, United Kingdom
Reviewed by
Soroush Abolfathi, University of Warwick, United Kingdom
Yong Jie Wong, Kyoto University of Advance Science, Japan
Updates
Copyright
© 2023 Pourzangbar, Jalali and Brocchini.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Ali Pourzangbar, ali.pourzangbar@kit.edu
Disclaimer
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.