REVIEW article

Front. Environ. Eng., 11 September 2023

Sec. Environmental Impact Assessment

Volume 2 - 2023 | https://doi.org/10.3389/fenve.2023.1235557

Machine learning application in modelling marine and coastal phenomena: a critical review

  • 1. Institute for Water and River Basin Management—Hydraulic Engineering and Water Resources Management, Karlsruher Institut für Technologie (KIT), Karlsruhe, Germany

  • 2. Department of Civil and Building Engineering and Architecture, Università Politecnica delle Marche, Ancona, Italy

  • 3. Department of Civil Engineering, Tehran University, Tehran, Iran

Article metrics

View details

34

Citations

8k

Views

3,4k

Downloads

Abstract

This study provides an extensive review of over 200 journal papers focusing on Machine Learning (ML) algorithms’ use for promoting a sustainable management of the marine and coastal environments. The research covers various facets of ML algorithms, including data preprocessing and handling, modeling algorithms for distinct phenomena, model evaluation, and use of dynamic and integrated models. Given that machine learning modeling relies on experience or trial-and-error, examining previous applications in marine and coastal modeling is proven to be beneficial. The performance of different ML methods used to predict wave heights was analyzed to ascertain which method was superior with various datasets. The analysis of these papers revealed that properly developed ML methods could successfully be applied to multiple aspects. Areas of application include data collection and analysis, pollutant and sediment transport, image processing and deep learning, and identification of potential regions for aquaculture and wave energy activities. Additionally, ML methods aid in structural design and optimization and in the prediction and classification of oceanographic parameters. However, despite their potential advantages, dynamic and integrated ML models remain underutilized in marine projects. This research provides insights into ML’s application and invites future investigations to exploit ML’s untapped potential in marine and coastal sustainability.

1 Introduction

Coastal areas are of vital significance due to their crucial role in supporting aspects such as biodiversity, economic activity, cultural heritage, climate regulation, food security, recreational opportunities, and strategic importance (Neumann et al., 2017). Ensuring their sustainability, however, is a challenge that requires addressing various factors, among which, climate change adaptation, beach protection and water quality management. One approach to ensuring the sustainability of coastal areas involves conducting a thorough examination of each contributing factor by employing data analysis and suitable methods. Effective data analysis and augmentation is, therefore, essential for informed decision-making and sustainable management of coastal areas.

The amount of data related to coastal systems has dramatically increased recently (Goldstein et al., 2019). This data, which often covers large areas and spans long periods of time, is now available in high resolution and can be accessed quickly. This has led to more opportunities for research on the sustainability of activities evolving in coastal areas. However, handling large and complex datasets, as well as identifying their patterns and trends, is not a convenient task. Despite their widespread use and mathematical rigor, conventional statistical techniques, including descriptive statistics (Emmanouil et al., 2020), inferential statistics (Agarwal and Manuel, 2008), regression analysis (Davidson et al., 1996; Hall et al., 2002), correlation analysis (Szmytkiewicz et al., 2000; Kroon et al., 2008; Ruiz de Alegría-Arzaburu et al., 2010), Analysis of Variance (ANOVA) (Martins et al., 2010), and Principal Component Analysis (PCA) (Hua et al., 2007; Miller and Dean, 2007), have limitations when processing large and complex data sets, and can present challenges in terms of interpretability. This has prompted researchers to explore alternative, more sophisticated approaches such as ML, which enables researchers to draw insights from data in a more efficient, accurate and automated way.

ML is a rapidly growing field that has the potential to make significant contributions to the sustainable use and management of marine and coastal environments. This by helping to better understand and predict the impacts of human activities and natural phenomena on coastal ecosystems and identify potential threats. The link between using machine learning to simulate coastal and marine events and sustainability revolves around creating models and taking action. Machine learning employs large amounts of data to create simulations for different scenarios, such as wave propagation or water quality management. These simulations help us fine-tune our actions, like improving wave energy converters or changing shipping paths to avoid pollution. Moreover, these simulations can guide our work towards adapting and mitigating the effects of environmental changes, like coastal erosion caused by rising sea levels. In essence, machine learning offers crucial insights that contribute to improved, sustainable care of our coastal and marine environments.

Typically, the primary input of ML algorithms consists of a data set in various forms such as numeric, image, DEMs collected by Lidar (light detection and ranging), video, and geographic information systems (GIS) data, which are mapped and visualized using GIS. The main output of ML algorithms in coastal engineering can vary depending on the specific application and dataset being used, which includes prediction [e.g., coastal flooding risk (Park and Lee, 2020), storm surge (Sajjad et al., 2020), wave height (Dogan et al., 2021), sediment transport (Pourzangbar et al., 2017b; 2017c; 2017a) and beach erosion (Beuzen et al., 2019)], image processing using satellite imagery data (Agrafiotis et al., 2019) or drone footage (Provost et al., 2020), pattern recognition [e.g., patterns of sediment transport (Liu et al., 2021)], placement optimisation (Cuadra et al., 2016; Sarkar et al., 2016; Neshat et al., 2019), optimization by identifying the most efficient and cost-effective solutions for protecting the coast from erosion and flooding, monitoring (e.g., using sensor data to detect erosion or changes in water quality), anomaly detection (e.g., unusual changes in water quality), and decision making (Lazuardi et al., 2021) by providing decision support to coastal managers and engineers. However, the applicability of ML approaches in coastal engineering is influenced by various factors such as data quality, computational resources, the complexity of the coastal system, and the choice of appropriate algorithms.

Several ML methods have been used to study the sustainable use of coastal areas, including: Artificial Neural Networks (ANNs) used for predictions such as water quality (Chen and Ma, 2010), river classification based on the water quality index (Wong et al., 2021), wave height (Rao and Mandal, 2005; Günaydin, 2008) and beach erosion (Hashemi et al., 2010) and tidal prediction; Decision Trees (DTs) used for classifying the dominating environmental factors; Random Forests (RFs) used for regression and classification tasks, such as predicting the effect of human activities on the coastal environment and water quality index modelling (Sakaa et al., 2022); Support Vector Machines (SVMs) used for solving classification and regression problems, such as identifying the most vulnerable areas in coastal zones; K-Nearest Neighbors (KNN) used for clustering and classification tasks, such as grouping coastal regions based on their sustainability indicators; Ensemble Methods used for improving the accuracy of predictions and classifications, such as predicting the impact of climate change on sustainability of coastal activities, among others.

ML has been widely used in numerous research studies, but there still exists a knowledge gap regarding the selection of parameters, choice of predictive models (be they dynamic or static), domain adaptation, and use of integrated models for analyzing complex systems and evaluating the effects of multiple factors. In relation to data treatment, many existing works have relied on simple heuristic methods or rules of thumb; however, there are more solid mathematical and metaheuristic methods for data preprocessing and parameter identification, highlighted in this paper. Choosing the correct model can be challenging and there is not a definitive method to identify the most suitable ML model for a given problem. In general, the ML approach used to solve a specific issue is selected through a process of trial and error. However, comparing how models perform under different conditions can aid in selecting the most suitable one for a specific issue. To the best of authors’ knowledge, there is not one single paper that offers comprehensive information about the data preprocessing and preparation phase. This paper provides an extensive review of various methodologies employed in coastal engineering to handle datasets. The main focus of this paper is to understand how ML models contribute to the sustainable use and management of marine and coastal environments, rather than the technical intricacies of their setup. The primary goal is to provide a critical review of literature that utilized ML approaches to manage marine phenomena. This review sheds some light on how to prepare parameters and datasets for input into the ML model, the pros and cons of various models, the suitability of ML methods for certain conditions, and their shortcomings and deficiencies.

Although numerous papers have discussed modeling coastal phenomena using experimental, numerical, and mathematical methodologies, the focus of the current paper is exclusively on literature that implemented ML techniques for modeling coastal and marine events. The selected literature spans a broad range of topics from data preprocessing and parameter considerations to different kinds of ML models used for various purposes. Due to the large amount of published papers, the focus of our contribution was directed towards resources published in reputable international journals such as Elsevier, Springer, IWA, Taylor and Francis, Wiley, ASCE, among others. The papers were chosen based on their publication in reputable international journals and were retrieved through online searches using relevant keywords. Among the publications, Coastal Engineering (Elsevier) with 18 papers and Ocean Engineering (Elsevier) with 17 papers, had the most papers in this area. The majority of the sources are fairly recent, predominantly within the past 10 years. Nevertheless, this paper includes some older references that established the groundwork for newer methods. Roughly, fewer than 5% of the literature we reviewed was published before 2000, about 14% between 2000 and 2010, 22% between 2010 and 2015, and over 60% in the last 10 years.

While ML has been implemented in numerous studies, knowledge gaps exist in areas such as parameter selection, choice of models for making predictions (dynamic or static), domain adaptation, and the use of integrated models for modeling complex systems. The emphasis of the paper is on the contribution of ML models to the sustainable use and management of the marine and coastal environment, rather than on the technical details of their configuration.

The paper is structured as follows: Section 2 discusses the key components of data analysis and preprocessing, including data collection and preparation for the modeling process. Section 3 focuses on studies that have applied AI to coastal engineering for sustainable outcomes. The paper also evaluates the accuracy and robustness of the different models in Section 4. Finally, the paper summarizes all the information presented and concludes with a list of references.

2 Data preparation (preprocessing)

Data preparation involves transforming raw data into a format that can be used by ML algorithms for extracting insights or predicting outcomes. This process is vital in ML as it considerably affects the performance of the model (Kelleher et al., 2015). In the event of missing or invalid data, the algorithm either cannot process it or yields less precise, possibly erroneous results. This procedure starts with the acquisition of raw data (refer to Section 2.2), followed by data integration, which entails consolidating data from various sources into a unified dataset. This is succeeded by data cleansing to rectify missing values and outliers (refer to Section 2.4), and then selecting the most pertinent features from the input parameters (feature selection or dimensionality reduction) (see Section 2.5). Subsequently, feature engineering is undertaken, which involves generating new variables from existing parameters using dimensional analysis (DA). Lastly, data transformation is carried out, which involves altering the scale or distribution of variables, such as through data normalization. Figure 1 depicts the multiple phases required for data preprocessing and the methods linked with each step. The upcoming sections provide a detailed explanation of these methods.

FIGURE 1

2.1 Marine data types

In coastal engineering, data can come in different forms (

Huang et al., 2015

) and can be classified into different types based on their identity, format, and structure. Some examples of coastal data types include:

  • (1) Numeric data (Timmermans et al., 2020), which includes measurements of various physical parameters such as water level, wave height, current velocity, sediment concentration. Such data are typically collected using instruments such as tide gauges, wave gauges, current meters, and sediment samplers. For example, time-series data such as ocean temperature records, sea level measurements, and storm surge data represented by a sequence of observations or measurements taken at regular intervals over time.

  • (2) Image data (Vos et al., 2019; Turner et al., 2021), which includes aerial and satellite imagery, as well as ground-based photographs. These data can be used to study coastal morphology, vegetation, and land use patterns.

  • (3) Point Cloud data (Gomez, 2022), represented by a set of 3D points that can be used to create 3D models of coastal terrain and structures. Point cloud data is often collected using light detection and ranging (LiDAR) systems and can be used to create high-resolution digital elevation models (DEMs) of coastal topography.

  • (4) Video data (Smit et al., 2007; Kim et al., 2020; Kim and Kim, 2020), which includes footage captured by cameras, this data can be used to observe the coastal dynamics and measure the beach profile, the shoreline position, and the wave breaking patterns.

  • (5) Text data (Brown et al., 2021), represented by written or spoken words, can be analyzed using natural language processing (NLP) techniques. Examples of text data in coastal engineering include social media posts, news articles, and scientific publications.

The following are the most well-known methods for collecting the data mentioned above: field observations, remote sensing measurements, experimental studies, numerical and mathematical models. Both the availability of equipment and the objective of the study influence the selection of the data collection medium (Prata et al., 2019).

2.2 Marine data resources

Data collection within the realm of marine sciences principally relies on three distinctive methods: in-situ observations, remote sensing techniques, and the use of mathematical and numerical models, as outlined by Verwega et al. (2021). In-situ data collection encompasses ship-based measurements, the deployment of moorings, gliders, autonomous underwater vehicles, drifters and floats, the use of sea-floor optic cables, and laboratory analyses. Field observations remain essential for the collection of real-world data on coastal processes, such as wave heights and tidal levels. In-situ instruments are highly accurate with proper maintenance but may have low-time frequency data for large areas. They offer historical climate trend insights not available from remote sensing and are less affected by atmospheric conditions. These observations serve to validate numerical models that simulate coastal processes and predict the behavior of the coastal system, including wave patterns, tidal currents, and shoreline evolution.

Remote sensing involves acquiring data on coastal topography, bathymetry, and other significant parameters through satellite and airborne platforms. Remote sensing technologies are divided into three categories: satellite, ground-based, and drones (Elsayed et al., 2021). The data thus collected enable the generation of high-resolution coastal environmental maps. Although satellites are powerful tools, they face limitations in obtaining high-resolution regional-scale imagery. Clouds can hinder data capture, and high-resolution imagery can be challenging to interpret (Elsayed et al., 2021). A combination of satellite- and ground-based remote sensing and drones could be effective in future marine engineering evaluations. Economically, combining these tools may be comparable to in-situ techniques in terms of overall cost. Such technology could enable rapid, high-resolution water condition assessments and enhance our understanding of water resource processes. Mathematical and numerical models generate data by simulating real-life systems or processes using mathematical equations and algorithms (Xie and Arkin, 1996). They provide the capability to extend observational data, even to the point of simulating future climate scenarios (Eyring et al., 2016). Nonetheless, it is crucial to understand that these models only approximate real-world scenarios and can encompass spatial and temporal scales that exceed the scope of observational data (Matthes et al., 2020). The outputs from these models are typically available on a unique grid, contingent on the specific simulation. For instance, climate models customarily provide a four-dimensional space-time grid. Consequently, the comparison of model outputs with measurements invariably necessitates interpolation or data aggregation. Table 1 provides a detailed summary of the advantages and disadvantages associated with these diverse data collection methodologies.

TABLE 1

Method (example Refs.)AccuracySpatio-temporal resolutionSelected measured parametersPros and cons
Data collection categoryIn situSamplingSampling Kit Paradinas et al., (2021)Extremely preciseMonitoring a single spotPressure, Wind Speed, Wave height, sea levelVery accurate; High spatial and temporal resolution; Expensive method and characterized by a lot of outlier data
Land fix instrumentsTide gauges Qiao et al., (2023)Good
Offshore fixed instrumentsBuoy Meng et al., (2021))High
Offshore campaignMoving instruments Knight et al., (2020)HighMonitoring a vast area
Remote sensingSatellitesSatelite Hagenaars et al., (2018); Turner et al., (2021)Very highMonitoring a vast area (meters to kilometers)Mean wave period, significant wave height, ocean temperatur, water level, waves and currentsLong-term operation; high data generation; lower cost compared to in situ methods; dependence on empirical equations; Incomplete data availability; requirement for system calibration
Land based instrumentsCoastal radar Gawehn et al., (2020); LIDARGood
Irish and White, (1998); Video monitoring
Soloy et al. (2021)
Onboarded instrumentsDrones Joyce et al., (2023)High
Mathematical and numerical modelsBasin wide modelsThe Copernicus Marine Service Copernicus, (2023)Depends on: benchmarking data; numerical scheme; and selected equationsDepending on the available computational power and input dataWave height, sediment flux, flow properties, bed levelSynchronization is maintained between all computational outposts; Cost-effective compared to in situ and remote sensing; Need to validate with other methods
Local wide modelsNSWE (Pourzangbar and Brocchini, (2022); FUNWAVE (Shi et al., (2012); SWAN Booij et al., (1997)

Detailed information of the various data collection methods in coastal engineering.

2.3 Data cleaning: outlier detection

Several factors can influence the quality of observational data. These include inaccuracies in the instruments, malfunctions of the equipment, disruptions from external sources, mistakes during data conversion, communication mishaps, and significant unforeseen errors (Yu et al., 2022). Such anomalies can pose major threats to operational functionality, downstream operations, system resilience, and cleaner production (Ba-Alawi et al., 2021). Therefore, these should be detected promptly and their data rectified to ensure more realistic measurements.

Anomaly detection methods are generally categorized into various types (see Figure 2) such as Statistical Methods, that utilize the properties of the underlying data distribution to identify anomalies (Chandola et al., 2009); Distance-based Methods, which calculate the distance between data points and identify the outliers based on a certain distance threshold (Ramaswamy et al., 2000); Density-based Methods, which estimate the density of data points and identify outliers as those points that reside in low-density regions (Ester et al., 1996); Machine Learning-based Methods, which employ supervised, unsupervised, or semi-supervised ML algorithms to detect outliers (Pimentel et al., 2014); and Ensemble Methods, which combine multiple outlier detection algorithms to improve the overall performance (Zimek et al., 2012). The choice of method, or combination of methods for better results, depends on the nature of the data and the specific problem being addressed.

FIGURE 2

Mahmoodi and Ghassemi (2018) used outlier detection algorithms to improve wave height predictions, while Oehmcke et al. (2015) demonstrated the effectiveness of ML for identifying significant events in marine long-term data. Daranda and Dzemyda (2020) developed a method combining the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm and k-nearest neighbors analysis for detecting marine traffic anomalies. These studies highlight the potential of leveraging advanced algorithms and ML in marine data analysis and decision-making. This section aims to provide a survey of contemporary outlier detection techniques, comparing their motivations, advantages, and disadvantages. Outliers can significantly impact the results, which makes addressing or eliminating them before analysis and model development crucial.

Considering the learning algorithm, three main methodologies exist for outlier detection (Hodge and Austin, 2004): 1) unsupervised approach, which uses a learning technique to identify outliers without prior knowledge of the data. The data is treated as a static distribution, and the most distant points are flagged as potential outliers; 2) supervised classification method, which requires pre-labeled data. It allows for online classification, where the classifier continuously learns the model and classifies new data as normal or abnormal, and finally 3) semi-supervised recognition technique, which only learns the normal class, using pre-classified data. It can distinguish new data as normal or novel based on its proximity to the boundary of normality. The choice of an outlier detection method depends on the data type, the number of vectors and attributes, speed and accuracy requirements, and the ability to accurately identify outliers. The key factors in choosing a method are selecting an algorithm that can handle the data and defining a suitable neighborhood for the outlier.

2.4 Dimensionality reduction

Incorporating parameters that are not relevant can result in intricate models that pose significant challenges in interpretation and execution compared to the models developed using the most crucial parameters (Pourzangbar, 2012). That is the reason why the focus is placed on building ML models using the most crucial parameters. These parameters are not only essential for the model’s output, but also are unconnected with other input parameters. To derive the most important dimensions (parameters) in the input space, there are several methods including min/max autocorrelation factor analysis (MAFA), dynamic factor analysis (DFA), Least Absolute Shrinkage and Selection Operator (LASSO), Independent Component Analysis (ICA), multicollinearity test and PCA. Table 2 summarizes some famous dimensionality reduction approaches used in marine engineering. The latter two methods are explained below.

TABLE 2

MethodReferencesField of studyRemark
Tolerance and VIF basedKaplan et al. (2010)Explanatory Variables (Meteorological and hydrological variables)• The best explanatory variable was identified, enhancing the overall model fit.
• The selected variables were not collinear, ensuring independent influence on the model.
Pourghasemi et al. (2018)Landslide conditioning factors• The study found no collinearity among the 17 landslide conditioning factors.
• Logistic Regression and LogitBoost demonstrated superior performance compared to the NaïveBayes method.
Izadi et al. (2021)The euphotic depth, sea surface temperature, and chlorophyll• Stacking the same variables across different days increased the feature space significantly, even though this approach may introduce potential multicollinearity.
• Input parameters that have a high correlation with the output parameter are considered significant.
El-Haddad et al. (2021)Flood susceptibility prediction• A multicollinearity analysis was conducted among nine flood-influencing factors.
• The analysis showed that the tolerance (>0.1) and VIF (<10) of all flood-influencing factors meet the accepted standards, indicating no multicollinearity.
• Therefore, all the independent flood-influencing factors can participate in the model establishment for the current study.
Deroliya et al. (2022)Flood risk mapping• Variance Inflation Factor (VIF) analysis was performed. As a result, multicollinearity-free geomorphic flood descriptors (MFGFDs) were used as input features in the ML models.
• Pearson correlation coefficients were calculated between all indicators with no high intercorrelations. As a result, the model was used to aggregate all available indicators after standardization, without the need for PCA.
MAFA and DFA basedKuo et al. (2019)water quality variables• MAFA results identified the main water quality variables in densely populated zones (Zones 1 and 3).
• Primary water quality variations in agricultural cultivation zone were found.
• DFA results suggest influence of domestic and municipal effluent pollutants.
F-testHessami et al. (2008)Automated regression-based statistical downscaling tool• They examined the level of statistical significance of the predictors
PCAZhuang et al. (2022)Port Planning• PCA was used to predict the throughput of Dongjiakou Port.
• The model’s effectiveness was verified by comparing predicted outputs with actual outputs.
Park and Oh (2022)Ship Propulsion Engine• Principal Component Analysis and K-Nearest Neighbors were used for data preprocessing. These techniques were employed to check if data were classified based on engine control characteristics. • Two types of Principal Components were derived using PCA to simplify the data collected in full-navigation mode. This approach was used to analyze the impact of each factor and reduce the analysis time.
Hua et al. (2007)Temperature–Frequency Correlation• PCA was initially used to extract principal components from the measured temperatures for dimensionality reduction. • The dominant feature vectors, along with the measured modal frequencies, were then used in a support vector algorithm to create regression models.
Arslan et al. (2020)Coastline Extraction on hyperspectral imagery• SVM and Neural Network classification accuracies did not significantly differ on the provided images. Therefore, it could be concluded that using Dimensionality Reduction (DR) strategies on the dataset does not have a significant impact on identifying the location of coastlines.
Freeman et al. (2021)Marine hydrokinetic (MHK) turbines• PCA enabled the maximum separation between classes to be depicted. Compared to other studies, the authors believe their method allows for more insightful inferences from PCA.
• The authors’ proposed framework can identify the most important dimensions/features (i.e., RMS, Skewness) for fault detection when applying PCA on their feature space data matrix.
Sierra et al. (2017)Analyzing coastal environments (grain size frequency curves)• Functional Principal Component Analysis (FPCA) was identified as a suitable alternative with significant advantages over conventional vector analysis methods.
• This is particularly true in the field of sedimentary geography studies.
Tayfur et al. (2013)Sediment
Load Prediction
• Predictive models were developed based on the outcomes of PCA.
• The results show that PCA is beneficial in these types of studies.
El-Rahman (2016)Hyperspectral image• PCA was used as a data analysis technique to reduce the dimensions of hyperspectral images before the classification process.
• This process employs an unsupervised Iterative Self-Organizing Data Analysis Technique (ISODATA) Algorithm.
LASSOTan et al. (2018)Tropical cyclone• After dimension reduction, the selected predictors retained a high explanatory capability for the complex information in the original data.
• They also maintained the features of each predictor effectively.
Tan et al. (2021)Typhoon intensity• Lasso and PCA were used for variable selection and dimensionality reduction.
• A ML method, Hierarchical Bayesian Model (HBP), was employed to correct the storm intensity predicted by the Regional Climate Model (RCM).
ICANajafi et al. (2011)Statistical downscaling of precipitation• Performance assessment showed the procedure successfully selects predictors for downscaling Global Climate Model (GCM) data on both monthly and seasonal timescales.
• The study indicated that by choosing the appropriate predictors, the Multiple Linear Regression (MLR) model is an effective method for precipitation downscaling.

Some well-known dimensionality reduction approaches and their example references.

2.4.1 Multicollinearity

Multicollinearity is a common issue that can arise in regression analysis when two or more predictor variables in a model are highly correlated with each other. This can cause problems in the analysis, such as unstable and unreliable coefficient estimates. There are several methods to detect multicollinearity in a regression model. Here are a few commonly used tests:

  • • Correlation matrix: A correlation matrix can be used to identify the degree of correlation between each pair of predictor variables. High correlation coefficients (e.g., greater than 0.7 or 0.8) may indicate multicollinearity.

  • • Variance Inflation Factor (VIF) quantifies how much the variance of the estimated regression coefficients is expanded due to multicollinearity. Suppose there are three input parameters: , and , and the goal is to compute VIF for . To accomplish this, we predict using linear regression based on and . Next, we determine the correlation coefficient between the predicted and actual values of , which we use to calculate VIF using the formula . Often, VIF values exceeding 5 or 10 serve as a benchmark for identifying variables that might pose problems.

If the VIF values for the independent variables are high, it indicates that multicollinearity is impacting the regression model. This issue might need to be resolved, possibly by removing one of the correlated variables, combining them, or applying methods such as ridge regression, or principal component analysis.

  • • Condition number: The condition number is a measure of the overall multicollinearity in the model and is calculated as the square root of the ratio of the largest to smallest eigenvalue of the correlation matrix. Condition numbers greater than 30 may indicate problematic multicollinearity.

  • • Eigenvalues: Eigenvalues of the correlation matrix can also be used to detect multicollinearity. Large eigenvalues (for example, greater than 1) may indicate high levels of multicollinearity.

  • • Tolerance (TOL) is another measure that can be used to detect multicollinearity in a regression model. It is the reciprocal of the VIF (variance inflation factor) and measures the proportion of the variance in a predictor variable that is not explained by the other predictor variables in the model. If the Tolerance value for a variable is close to 1, it suggests that there is no multicollinearity between that variable and the other predictor variables in the model. On the other hand, if the Tolerance value is close to 0, it indicates a high degree of multicollinearity between that variable and the other predictor variables in the model. In general, Tolerance values of less than 0.1 or 0.2 are indicative of problematic multicollinearity.

It is important to note that none of these tests can definitively prove the presence of multicollinearity, but rather provide evidence that it may be present in a model. Therefore, it is important to use multiple tests and to interpret the results in the context of the specific research question and data being analyzed.

2.4.2 Principle component analysis

PCA can be utilized for dimensionality reduction (Pearson, 1901). PCA reduces the dimensions of datasets in a way that their interpretability increases. To achieve this, PCA maximizes the variance of datasets by mapping them in a new coordinate (new uncorrelated variables). The most correlated parameters are deleted while information loss is minimum. The initially proposed method was limited to up to three parameters; however, Harold Hotelling has described methods for computing multivariate PCA since 1933 (Hotelling, 1933).

In the mathematical description, it is assumed that the input environment contains parameters and measurements for each parameter. Hence, the input matrix has components. The input environment can be transformed into a feature environment whose dimensions are not dependent on each other. Accordingly, the feature environment can be represented by a matrix, i.e., . The transformation can be done using a whitening or sphering transformation matrix () as follows:

The primary goal of PCA is to identify the components of the transformation matrix in such a way that the new variables exhibit maximum discrepancy (represented by variance). With some mathematical manipulation, the following equation for the transformation matrix can be derived:where is the covariance matrix of the input environment (), is a diagonal matrix whose components are the eigenvalues () of the matrix , and is a matrix that its components are the eigenvectors of .

2.5 Dimensional analysis

Although numerous methods exist for DA, the majority of studies employ the Buckingham π Theorem to render the parameters dimensionless. Table 3 summarizes some of the studies used DA before feeding their ML models.

TABLE 3

MethodReferencesField of studyDimensionless relationship (well-known numbers)
The Buckingham π theoremBateni et al. (2007)Scour depth prediction around bridge piers using ML approaches (The Reynolds Number and The Froude Number)
Tayfur et al. (2013)PCA and data-driven methods for enhancing sediment load prediction
(The Reynolds Number; The Froude Number; Dimensionless sediment diameter; The Mobility Number)
Macayeal et al. (2011)Iceberg-capsize tsunamigenesisThe Froude Number
Jayaratne et al. (2016)Tsunami-Induced Local Scour and Failure Mechanisms in Coastal StructuresThe Shields parameter
Deng et al. (2016)Wave force on a vertical cylinder (The Reynolds Number; Scattering parameter; The Keulegan– Carpenter number; The Froude Number)
Ranasinghe et al. (2010)Reaction of the Shoreline to a Single Submerged, Shore-Parallel Breakwater
Nakamura et al. (2008)Tsunami-Induced Scour Surrounding a Square Structure
Peña et al. (2011)Comparative experimental analysis of wave transmission coefficients, mooring line and module connector forces across various floating breakwater designs
Hong et al. (2013)Propeller Jet-Induced Scour (The Froude Number; Offset height ration; Relative Submergence)
Karimpour et al. (2016)Impacts of wind waves and currents on saltmarsh fringe deterioration (The Reynolds Number)
Santamaria Cervantes et al. (2022)Uncertainties in coastal protection slope formulas (The Reynolds Number; Wave Steepness; Relative Water Depth)
Kitsikoudis et al. (2015)Evaluating sand-bed river sediment transport(The Reynolds Number; The Froude Number; The Shields parameter)

Comparative overview of various studies utilizing DA and their derived dimensionless parameters.

2.6 Normalization

Normalizing data helps to ensure comparability by transforming it into a common scale, avoiding bias in statistical analyses and allowing for accurate and meaningful results by removing the impact of unit differences, especially when comparing data from different sources. Normalization plays a crucial role in efficient machine and deep learning by ensuring that large numerical inputs are processed effectively (Van Komen et al., 2022). The choice of normalization method depends on the specific requirements of the data and the problem being solved. Some of the famous methods for data normalization are summarized in Table 4.

TABLE 4

MethodEquationLiteratures used this method
Max-min normalizationPourzangbar et al. (2017b); Kramer (2013)
Z-score normalizationEwuzie et al. (2021); Masmoudi et al. (2021)
Sigmoid normalizationLatif et al. (2023)
log scalingBai et al. (2015); Pourzangbar et al. (2017a)

Well known Normalization techniques used in ML modeling.

In Table 4, the transformed data, referred to as , is obtained by normalizing the original data () in a new range. The original data is contained within a vector, denoted as , and its minimum and maximum values are represented as and , respectively. The chosen minimum and maximum values for the transformed range are and , which are typically set to zero and one, respectively. is the mean of the data, and is the standard deviation of the data.

Min-Max normalization is a technique used to rescale a feature to a specific range, usually between 0 and 1. However, to avoid having zero data in the model, an alternative approach is to expand the range to include values between 0.1 and 0.9. It is a commonly used method for transforming variables so that they are comparable, as it scales the data linearly to a specific range. Through this normalization process, the values in are transformed such that the minimum value of is mapped to 0, the maximum value to 1, and intermediate values are mapped to corresponding values between 0 and 1. The Z-score normalization, also known as standardization, is a method of transforming data to a standard normal distribution with a mean of 0 and a standard deviation of 1. This normalization process rescales the data and centers it around the mean, allowing for easier comparison of values. It is commonly used in various fields, such as statistics, ML and data analysis. Sigmoid normalization uses a sigmoid function to transform the data, proving useful in instances where the data distribution is asymmetrical. The sigmoid function maps any input value to a value between 0 and 1 and it is commonly used in ML and ANN models to represent a probability or to rescale data. Additionally, the sigmoid function is differentiable, which makes it useful in optimization problems and backpropagation in neural networks.

In coastal phenomena, the relationship between inputs and outputs typically displays nonlinearity, but certain models, such as the M5 model tree, are unable to handle nonlinearity. To address this limitation, M5 models have been implemented using a logarithmic form for both inputs and outputs (i.e., the natural logarithm of inputs and outputs). This logarithmic form is more accurate than a linear formulation because it better captures the nonlinear nature of the contributing parameters (Pourzangbar et al., 2017a; Afsarian et al., 2018). Log scaling entails transforming data points through the application of a logarithmic function. The logarithm maps large values to smaller ones and vice versa, helping to make skewed data more symmetrical and manageable for analysis. The selection of a specific logarithmic function depends on the needs of the data and the analysis to be performed, such as log base 10, log base 2, or natural logarithm. Despite its advantages, normalization may result in a loss of interpretability, increased sensitivity to outliers (as seen in techniques like min-max scaling and z-score), loss of information, dependence on the entire dataset, impacts on categorical features, and varying sensitivity across algorithms.

3 AI learning algorithms and their application in marine/coastal engineering

3.1 Supervised-based ML methods

Supervised ML presents a powerful approach, necessitating labeled data for model training. Its versatility permits its usage across a variety of applications, such as image and speech recognition, natural language processing, and predictive analytics. Common algorithms used in supervised learning encompass linear regression, logistic regression (LR), decision trees, random forests, support vector machines, and neural networks. A key advantage of supervised learning is its capacity to generate precise predictions for novel and unseen data (Jiang et al., 2020). However, it also has certain drawbacks, including the requirement for labeled data, the quality and quantity of the training data, and the potential for overfitting. Despite these challenges, supervised learning is seen as an essential method in ML and data science, demonstrating high accuracy and less computational time compared to physical models. Despite the inherent complexity of marine processes, supervised-based ML models have demonstrated benefits in understanding coastal phenomena, thereby finding extensive application in coastal engineering to drive innovative models and solve intricate problems (as summarized in Table 5). Supervised ML models have been employed to predict wave parameters like significant wave height and period, wave reflection and transmission coefficients (van Gent et al., 2007; Gandomi et al., 2020; Kuntoji et al., 2020), tide levels (Lee, 2004), ocean currents and wind files (James et al., 2018; Shamshirband et al., 2020), prediction of wind Characteristics under future Climate Change scenarios (Yeganeh-Bakhtiary et al., 2022), flood inundation using Gaussian process model (Donnelly et al., 2022) and breakwater stability number and wave overtopping discharge, among others. Various ML models, such as ANN and SVM, can be employed to do these predictions. ML models have also found application in morphological and morphodynamic predictions, including profile elevation, area, and length, based on parameters like wind speed, direction, wave height, and beach angle (Hashemi et al., 2010).

TABLE 5

Learning approachModel typeAlgorithmOutput (Reference)
Supervised learningClassificationANNCoastal vulnerability map Ennouali et al., (2023); Coastal waters classification Pereira and Ebecken, (2009); Coastal Altimetric Waveforms Xu et al., (2021); Sea Surface Temperature Imagery Reggiannini et al., (2022)
SVM
RF
K-Nearest neighbor
Naive-Bayes classifer
RegressionANNWave condition James et al., (2018); Significant wave height Ali et al., (2023); Breaking wave height Duong et al., (2023); Sediment load Latif et al., (2023); Wave attenuation Kim et al., (2022)
SVM
Regression (linear, logistic)
Unsupervised learningClustringK-means and K-medianSeabed color Wattelez et al., (2022); Land cover classification (Moody et al., (2014); Characteristics of Wastewater Discharges Di et al., (2019); Smart Port Construction Yao et al., (2018); Spatiotemporal Outlier Detection Chen et al., (2016); Ouliers in coastal water temperature Cho et al., (2013); Coastal environmental and atmospheric data reduction Mészáros et al., (2022); Surface water quality Moncada et al., (2021)
Hierarchical clustering
Density-based clustering
Gaussian mixture models
Anomaly detectionStatistical-based
Distance-based
Clustring-based
Density-based
Dimensionality reductionPCA
Reinforcment learningModel-freeQ-LearningReal-time control of coastal urban stormwater systems Bowes et al., (2022); Flood mitigation Bowes et al., (2021); Maximize Energy Efficiency Sarkar et al., (2022)
Hybrid
Policy optimization
Model-basedQ-learning
Given the model

Various ML learning approaches utilized in coastal studies, along with their associated models and methods.

3.2 Unsupervised-based ML methods

Unsupervised learning is a form of ML that functions without predefined labels or target outcomes (Bishop and Nasrabadi, 2006). Its main purpose is to independently discover patterns, structures, and relationships in data. Common applications include clustering, anomaly detection, and dimensionality reduction. Clustering groups similar data points, anomaly detection spotlights unusual patterns (as detailed in Section 2.3), and dimensionality reduction simplifies the number of features while preserving essential information (as seen in Section 2.4). Algorithms like k-means clustering, hierarchical clustering, PCA, and autoencoders are frequently used in unsupervised learning to identify patterns in data. While unsupervised learning can pose challenges due to the lack of a distinct optimization goal, it still holds a vital position in ML, contributing to advancements in fields such as computer vision, natural language processing, and recommendation systems. In the context of coastal engineering, k-means clustering can be used to classify centroid values for data like the maximum oceanic wind. Average centroid clustering can be obtained from both the previously chosen values and the currently selected clustering data (Baboo and Tajudin, 2013). PCA can be employed in coastal engineering to examine correlation matrices (Roseman et al., 2005) and pinpoint major changes in beach profiles and sand grain distributions (Tsujimoto et al., 2012). Moreover, PCA and hierarchical clustering can help characterize coastal plane shape and hydrodynamics. For instance, the form of arc-shaped coasts, largely influenced by geological structure, can be divided into four broad categories that reflect actual conditions using clustering (Scott et al., 2011). By identifying key data components, PCA can aid in elucidating the underlying patterns and structures of the data.

3.3 Reinforcement-based ML methods

Reinforcement learning (RL) is a type of ML where a program, known as an agent, learns to perform tasks by getting feedback from its environment in the form of rewards or penalties (Rengarajan et al., 2022). The agent executes a series of decisions in a mutable environment, aiming to learn the optimal way (or policy) to maximize rewards over time. This process is typically structured as a Markov Decision Process (MDP), encompassing states, actions, transition functions, and reward functions. There are two main types of reinforcement learning algorithms: model-based and model-free. Model-based RL is like making a map to understand the surroundings. On the other hand, model-free RL does not make a map; it just figures out what to do based on where it is at the moment. So, model-based RL is more about planning ahead, while model-free RL is more about learning on the go (Plaat et al., 2023). Model-free methods, like Q-learning, do not need a model of the environment and calculate the expected total of future rewards for each possible action at each state using the so-called the Bellman equation. Q-learning has been used successfully in many different tasks, which is why it is one of the most commonly used model-free RL algorithms. In coastal engineering, RL can be used to develop control policies to reduce the risk of flooding (Bowes et al., 2021). Deep reinforcement learning (DRL), an advanced form of RL, can be used to control devices that convert wave energy, and has been found to work better than traditional control methods (Anderlini et al., 2020). DRL can also adjust itself to changes in system dynamics, allowing for control even when faults occur. Moreover, RL has been used to maximize the electricity produced by wave energy converters (Zou et al., 2022). In addition, a type of RL called multiagent reinforcement learning can simulate the social and economic effects of sea level rise. This can be a useful tool for planning scenarios, analyzing costs and benefits, and optimizing strategies to adapt to changes (Shuvo et al., 2022).

Table 4 summarizes the various ML learning approaches and their corresponding model types. Each model type utilizes a unique set of algorithms. For instance, in the case of classification tasks, ANNs or SVMs may be utilized. The final column of the table highlights the research studies focusing on each specific learning approach, targeting the investigation of a specific coastal process or event.

3.4 AI contribution to the sustainability of marine environments

Predictive models, such as statistical, numerical, or ML models, play a vital role in marine and coastal engineering to safeguard structures from natural forces. Statistical models use past data to forecast future conditions, while numerical models simulate the event using mathematical equations and formulas. ML models, using artificial intelligence (AI), learn from past data for prediction purposes. Each model has its unique approach and is chosen based on data availability and specific project needs.

Various ML techniques have been implemented in the study of coastal and marine environments. Figure 3, sourced from Scopus, provides a visual representation of the percentage of published papers that used different ML methods since the year 2000. Upon reviewing this figure, it is evident that Principal Components Regression (PCR), Linear Model (LM), Regression Tree (RT), and ANN are the most frequently employed ML algorithms for analyzing coastal and marine phenomena. However, certain ML techniques, such as General Regression Neural Networks (GRNN), M5 model tree, Bayesian Model Averaging method (BMA), Generalized Boosted Regression (GBM), and Extreme Gradient Lift (Xgboost) have been applied less frequently in the investigation of coastal and marine events.

FIGURE 3

Figure 4 shows the application trend of different ML approaches for coastal and marine phenomena. Previously, techniques such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and RF were sparingly employed in diverse studies. However, there has been a significant increase in their use over the past 5 years, demonstrating a growing reliance on these methods in recent research.

FIGURE 4

Figure 5 illustrates the trend of various ML algorithms since 2008 in coastal and marine applications. Some methods are not frequently used which are colored in red (these low-important approaches are not reported in Figure 4).

FIGURE 5

3.4.1 Prediction of oceanographic and morphologic parameters

Researchers use ML algorithms and soft computing techniques to predict oceanographic and morphological parameters, as shown in Figure 6. These methods include ANNs, SVMs, Support Vector Regression (SVR), Fuzzy Logic (FL), evolutionary algorithms, such as Genetic Programming (GP) and DTs, among others. Predictive models are widely used in oceanography and coastal management. Their accuracy critically depends on several factors. These include the dataset used for training, the type and configuration of the ML model, tuning parameters, termination condition, and input and output parameters. It is important to note that specific algorithms, with carefully adjusted parameters, are particularly valuable in various research endeavors, depending on the problem being addressed.

FIGURE 6

ANNs, SVRs, M5 decision tree algorithm, and Recurrent Neural Networks (RNNs) including Long-Short-Term Memory (LSTM) models are used to predict wave heights, as per studies by Duong et al. (2023) and Rizianiza and Aisjah (2015). These ML techniques have shown reliable wave prediction capabilities, maintaining accuracy up to 72 h ahead (Jain and Deo, 2008). The use of intact structural data for predicting significant wave heights has been explored, with emphasis on the critical role of data quality in training ANNs for wave height predictions (Ciortan and Rusu, 2018; Demetriou et al., 2021). ANNs have also been implemented to estimate wave breaking heights considering various factors like seabed slope, water depth, and deep-sea wavelength (Duong et al., 2023). In the field of marine energy forecasting, researchers have used multi-class classification methods with ordinal classifiers, such as SVOREX and SVORIM yielding precise results (Fernández et al., 2015). RNN, especially LSTM models, have been employed to predict motion responses in irregular wave patterns (Kagemoto, 2020). Table 6 provides a summary of the top 10 highly-cited papers focused on predicting significant wave height using ML algorithms. The majority of these studies used meteorological data and past wave height as input parameters. The results demonstrate that LSTM neural networks, ANN, kernel-based predictors like SVM and SVR, as well as decision trees, are capable of accurately predicting wave height.

TABLE 6

ReferencesMethodDatasetInputsResults
Mahjoobi and Etemad-Shahidi (2008)C5 algorithmwind and wave data from Lake Michigan, 2000–2004Wind speedDecision trees, having similar error statistics to ANNs and an acceptable error range, are efficient for predictions and advantageous because they represent classification rules and linear equations.
CARTWind direction
ANN
Mahjoobi and Adeli Mosabbeb, (2009)SVM (RBF)2086 records for Training and 2007 records for TestingWind speedIn modeling the wind speed, SVM outperformed ANN.
SVM (polynomial)
ANN (MLP)
ANN (RBF)
Fernández et al. (2015)ELMOR*; KDLOR*;ONN*; POM*; SVOREX*Meteorological reanalysis and standard data from buoys were collected for the entire years of 2012 and 2013, from January 1st to December 31st.Meteorological variables including air temperature, sea level pressure, the zonal component of the wind and the meridional component of the wind.In modeling the meteorological variables ordinal classifiers (SVOREX and SVORIM) outperformed nominal classification and regression methods.
SVORIM*; SVR
Cornejo-Bueno et al. (2016)GGA-ELMData for two complete years (1st January 2009–31st December 2010) are used.Wind direction and speed; Gust speed; Significant wave height; Dominant and Average wave period; Direction DPD; Atmospheric pressure; Air and water temperatureA hybrid GGA-ELM approach is proposed for accuracy in prediction of wind speed and direction.
The GGA-ELM selected features were tested using ELM and Support Vector Machine on a real-world problem, yielding good results.
Berbić et al. (2017)ANN and SVMCollected wave height data from two Adriatic Sea locations, November 2007–2008.Previous wave heightsThe study utilized Weka software to predict significant wave heights using ANN and SVM methods, incorporating wind data.
Kumar et al. (2017)MRAN*Data from 13 stations across diverse global regions was collected from 2011–2015 for the study.Wind speedMRAN and GAP-RBF outperform SVR and ELM in daily wave height prediction, with MRAN surpassing GAP-RBF, using minimal network resources and accurately predicting significant wave heights.
GAP-RBF*Wave height
SVR
Kumar et al. (2018)SLFNThe oel is trained via 10 diverse terrain stations from 2011 to 2014, and was tested using data from early to mid-2015.Wave and atmospheric dataThe Ens-ELM outperforms ELM, OS-ELM and SVR in the daily wave height prediction.
Ens-ELM
SVR
OS-ELM
Ali and Prasad (2019)ICEEMDAN-ELMHs data from Queensland, 2000–2018. Half-hourly intervalsWave height at previous timesHybrid ICEEMDAN-ELM outperforms comparative models like RF, ELM and MLR in Australia’s energy sites.
Fan et al. (2020)LSTM neural networkHourly data from ten global ocean buoys was used. Number of datapoints are 428770.The previous wave height, sea surface temperature, wind direction and speed, and pressureIn predicting wave height, LSTM showed strong long-term prediction capacity, with the proposed SWAN-LSTM model improving prediction accuracy by over 65% compared to the standard SWAN model.
Shamshirband et al. (2020)ANNThe wavedata recorded in Bushehr and Assaluye ports during 2008 are employed as target variablesWind speedAll models, such as ANN, ELM and SVR, effectively predict outcomes, with a nested grid approach proving efficient for the study bathymetry.
ELM*Wave heightThe ELM slightly outperforms ANN and SVR, despite generally similar performances.
SVR

Details of the selected reviewed papers, where the ML methods were used to predict the wave height.

*MRAN: minimal resource allocation network; Growing and Pruning Radial Basis Function (GAP-RBF); Extreme Learning Machine (ELM); Grouping Genetic Algorithm—Extreme Learning Machine approach (GGA-ELM); Ensemble of Extreme Learning Machine (Ens-ELM); Online Sequential ELM (OS-ELM); Kernel Discriminant Learning for Ordinal Regression (KDLOR); SVOR, with Implicit constraints (SVOR-IM); SVOR, with Explicit constraints (SVOR-EX); Proportional Odds Model (POM); Ordinal Neural Networks (ONN); ELMs have been adapted to ordinal regression (ELMOR)

To enhance understanding of the effectiveness of various ML methods in predicting wave height, visual representations of the correlation coefficient and Root Mean Square Error (RMSE) values for different ML techniques applied across multiple data sets have been created (

Figure 7

). To achieve this, we carefully selected studies that used several ML methods for wave height predictions, ensuring each study used a consistent dataset. This allowed for a visual representation of the performance of these ML techniques with specific datasets. By comparing the overall performance of these ML models across various datasets, certain conclusions can be drawn.

  • • ANN and SVR algorithms are commonly used in predicting wave height.

  • • The count of neurons present in the hidden layers of ANNs slightly influences the precision of the model.

  • • Integrated algorithms, like ICEEMDAN-ELM, exhibit superior performance in terms of accuracy and error indices compared to other ML methods.

  • • There has been a significant increase in the adoption of ML algorithms, especially integrated algorithms, in recent years (see Figure 8).

FIGURE 7

FIGURE 8

The M5 decision tree algorithm, ANNs, and gradient boosting decision trees serve as robust tools for predicting wave overtopping discharge on coastal infrastructure such as breakwaters. When focusing on wave overtopping and runup, the M5 decision tree algorithm exhibits promising capabilities for predicting runup waves, taking into account laboratory data and multiple parameters (Abolfathi et al., 2016). ANNs are also used to predict wave reflection and transmission coefficients (Zanuttigh et al., 2016; Formentin et al., 2017). It has been proven that gradient boosting decision trees, as a novel ML technique, has improved the accuracy of predicting average wave overtopping discharges by nearly threefold in comparison to traditional neural networks (den Bieman et al., 2020). Kernel-based approaches, such as Gaussian Process Regression (GPR) and SVR, have also been utilized in predicting wave overtopping, with GPR showing superior performance over ANNs and empirical formulas (Hosseinzadeh et al., 2021).

The measurement of Sea Surface Temperature (SST) is vital for understanding the global climate. It significantly contributes to climate modeling, weather forecasting, and studies on marine ecosystems. Accurately predicting SST can aid in mitigating the environmental harm resulting from rising water temperatures due to human-induced climate change. This prediction not only benefits marine ecosystems but also preserves coastal economies and the broader coastal environment (Choi et al., 2023). LSTM neural networks have proven effective in forecasting SST, showing enhanced performances when the right amount of input data is used (Xu et al., 2020). Multivariate LSTM models, which take into account factors such as wind speed and sea-level air pressure alongside SST, have demonstrated superior results compared to univariate models that only factor in SST (Balogun and Adebisi, 2021). Traditional ML models have been studied for spatio-temporal time series prediction, highlighting the importance of spatial data. Among these, the LSTM model emerged as the most efficient, showing a 25% improvement in forecasting performance (based on RMSE) when spatial information was incorporated (Kartal, 2023). Research indicates that LSTMs, whether using single or multiple variables, surpass other ML models in predicting SST (Xu et al., 2020; Kartal, 2023).

Moreover, accurate predictions of coastal sediment transport are crucial for managing coastal erosion and development, with researchers traditionally estimating sediment transport using experimental methods. Artificial intelligence-based methods potentially improve decision-making for managing coastal erosion and development (Bakhtyar et al., 2008; Kabiri-Samani et al., 2011), given the importance of selecting valid input data and appropriate activation functions (Pourzangbar, 2012; Yeganeh-bakhtiary et al., 2012). Artificial intelligence and ML methods, such as Adaptive Network Based Fuzzy Inference Systems (ANFIS), Fuzzy Inference System (FIS), CERC (Coastal Engineering Research Center), Walton-Bruno (WB), Van Ridge (VR), and ANNs, have been employed to model sediment transport, with ANFIS showing higher accuracy and reliability for estimating longshore sediment transport rates (LSTR) (Bakhtyar et al., 2008; Hashemi et al., 2010). SVR has also been employed, demonstrating superiority over neural networks when the dataset is small or the relationships are linear or non-linear but with a clear margin (Dezvareh and Shafaghat, 2020). Deep learning models, like ANNs, have been developed to address the shortcomings of numerical models in analyzing simultaneous sand and sediment transport (Kim and Aoki, 2021).

3.4.2 Classification models

Classification involves categorizing items or data into groups based on their features, and is crucial in fields such as statistics, ML and data analysis. The goal is to create models that predict the class of new items by identifying patterns in their features. SVM was introduced in the 1990s, RF in the early 2000s, and LR has roots going back to the 19th century. These algorithms are capable of executing simple tasks such as recognition and classification (

Lou et al., 2021

). In addition to these algorithms, a variety of other classification algorithms, including naive Bayes classifier, DTs, and K-Nearest Neighbors, have been utilized in remote sensing and

in situ

data analysis to enhance the understanding and monitoring of the environment.

Table 7

summarizes the most well-known classification models used in coastal and marine engineering. These algorithms have proven effective in unraveling complex environmental data and facilitating informed decision-making (

Tsiakos and Chalkias, 2023

). Accordingly, the most famous classification methods are:

  • • SVM (Cortes and Vapnik, 1995): focuses on training samples near the optimal class boundary, aiming to maximize the margin between support vectors. Fundamentally, it is a binary classifier, and the processing time is managed by applying the classifier to every class combination.

  • • Regression Tree (RT) (Goldstein et al., 2019): break down prediction tasks into binary splits, forming a tree structure. This tool excels at classification tasks and enables an understanding of the influence of input variables. However, RTs may not be as effective for continuous variables and are prone to overfitting if not properly pruned. Accuracy can be boosted by merging small sequential RT models, giving more weight to poorly predicted data.

  • • Decision Trees (Pal and Mather, 2003): easy to understand, DTs recursively split data. They can use categorical data and perform classification quickly. However, DTs may suffer from overfitting and non-optimal solutions, which can be addressed through pruning.

  • • RF (Breiman, 2001): an ensemble classifier using multiple DTs to overcome their limitations. Each tree uses a random subset of training data and features, resulting in a more accurate ensemble. RF classifiers are known for their speed, resistance to overfitting, and ability to handle multicollinearity. They can also assess the importance of variables, although they may be sensitive to certain sampling strategies (Belgiu and Drăgu, 2016).

  • • Kernel and Nearest Neighbor (K-NN) classifier (Altman, 1992): The K-NN classifier is distinct from other classifiers because it does not create a model during the training phase. Instead, every unclassified sample is directly compared with the original training data.

  • • Naive-Bayes classifer: it is a classification algorithm that is based on Bayes’ theorem and assumes that the presence or absence of one feature is independent of the presence or absence of other features. It learns the probability distribution of features and corresponding labels from a training dataset and uses it to classify new examples. This algorithm is widely used in applications that have many features and large datasets, such as text classification, sentiment analysis, and spam filtering. The Naive Bayes classifier is computationally efficient and can handle high-dimensional data well.

TABLE 7

ReferencesMethodDatasetInputsResults
Output(s)
Heumann (2011)Object-Based Image Analysis (OBIA) that melded a DT with SVM classification methodsImages from the Worldview-2 sensorVegetation field dataThe study correctly identified true mangroves with over 94% accuracy. However, it struggled to map fringe mangroves due to spectral and zoning issues, especially in sparse or degraded areas.
Mangrove Associates
Kalkan et al. (2013)object-based classification (OBC) and SVMThe Lakeland region of TurkeyCoastline featuresAutomatic coastline extraction methods were compared to manual digitization, showing both methods achieved sub-pixel accuracy in detecting coastline features from Landsat 8 imagery.
and Landsat 8 data
Kong et al. (2017)GS optimized SVM324 sampling sites collected across the Yellow Sea and East China SeaDO, Chl-a, C1, C2, C3, and C4 and the TRIX indexThe method demonstrated high predictive performance and accurate eutrophication status classification
eutrophication status of coastal watersThe findings support the feasibility of using SVM technique for rapid evaluation of eutrophication status with easily measured parameters.
Adam et al. (2014)RFA dataset from the 2010 KwaZulu-Natal provincial LULC mapRapidEye imageHigh spectral variation challenges RF and SVM in classifying certain LULC types, but incorporating the red-edge band significantly improves vegetation cover type classification accuracy.
land-use/cover (LULC) map
Li and Wang (2011)RF and Markov chain1998 to 2009 in Tianjin Environmental Aspect BulletinTime-series Sea water qualityRandom Forests and Markov chain were used to fit a function relating transition probability to pollution and environmental investment, based on historical data.
Sea water quality
Liu et al. (2021)CNNCoastal images and tidal data (20+ years)Hourly coastal images and tidal dataCNN provides location and shape information of offshore dam, coastline, waves at the coastal dam, and trough data for classification decision-making.
Categorized beach states (8 classes)It has good generalization ability.
Hoonhout et al. (2015)Structured Support Vector Machine (SSVM)Manually annotated dataset of 192 coastal imagesCoastal imagesPixel classification accuracy: 93.0%
Pixel-wise classification (water, sand, vegetation, sky, object)Algorithm extracts beach widths and water lines from coastal camera images without manual quality control.
It enables the analysis of large, long-term coastal imagery datasets and the application to various types of coastal images.
Annotated dataset and open-source software are provided for free, promoting further research in coastal image analysis.
Shafaghat and Dezvareh (2021)SVMCoasts of Hormozgan provinceWave height, direction, period, and particle sizeSVM accurately categorizes sediment transport rate into critical and non-critical states for each beach, using a Gaussian kernel (RBF) and optimal coefficients of C = 9 and σ = 0.28.
Sediment transport rate
Mahjoobi and Etemad-Shahidi (2008)CART5 years (2000–2004) of wave and wind data from Lake MichiganWave and wind dataThe results of decision trees were compared to those of ANNs, showing similar error statistics.
and C5 algorithmSignificant wave heightThe decision tree approach is considered efficient and successful for predicting significant wave heights and offers the advantage of visualizing decision rules compared to neural networks.
Çelik and Gazioğlu (2022)SVM, MLP and Ensemble Learning (EL)bedrock, beaches, and artificial coastsCoastlinesClassifiers were accurate on unshaded bedrock coasts, and their results were similar.
Extraction errors were encountered on bedrock coasts due to shadows, and MLP classifiers with Linear, Logarithmic, and Tanh activation functions were found to be the most accurate.
Beach type coasts presented challenges due to shallow depths and suspended solids affecting classification accuracy. EL classifiers and SVMs with sigmoidal kernel function were adversely affected, but the best results were obtained by other SVMs and MLP classifiers.
On artificial coasts, all classifiers provided accurate categorizations.
Shenbagaraj et al. (2014)ISODATA (unsupervised classifiers)sensor, Toposheet and Google Earth Images were used over a 60 year period from 1953 to 2013 between Kolachel and KayalpattanamCoastline changesThis approach effectively identified the areas of coastline transgression and regression in the study area.
Rokni et al. (2015)ANN; SVM; Maximum LikelihoodAugust 2000 to July 2010; Lake Urmia, Northwest of IranFused images highlighting changed areas, classified mapsThe proposed approach effectively detected surface water changes, especially when using the Gram Schmidt-ANN and Gram Schmidt-SVM techniques. The results show that Lake Urmia lost about one third of its surface area in the 2000–2010 period.
Sekovski et al. (2014)The satellite imagery is from 2011, and lidar data is from 2005. 40 km stretch of coastline in the Municipality of Ravenna, Northern Adriatic Sea, Italy.Four supervised image classification techniques (Parallelepiped, Gaussian Maximum Likelihood, Minimum-Distance-to-Means, and Mahalanobis distance) and the unsupervised ISODATAHigh-resolution multispectral WorldView-2 satellite imagery from 2011, and airborne lidar data from 2005.Shorelines produced by ISODATA and Mahalanobis show the highest agreement with reference shorelines, having an average median distance of 2.2 m. Parallelepiped and Maximum Likelihood shorelines had the highest average median distance from the reference shoreline (5.1 and 5.6 m, respectively). Heterogeneous coastal stretches exhibited a larger offset between extracted and reference shorelines than homogeneous ones. The comparison between the Mahalanobis classification results and lidar data detected an erosive trend in a wide portion of the study area.
Delineated shorelines

Overview of highly-cited literature studies (extracted from Scopus) on classification models in coastal and marine phenomena.

KNN classifier has been used in various marine-related projects. For the design of marine hydrokinetic turbines, KNN was used to identify and categorize the severity of the rotor blade pitch imbalance encountered by marine current turbines. This approach was found useful for fault detection and severity classification (Freeman et al., 2021). In ocean surface current forecasting, KNN was used as an alternate method (Jirakittayakorn et al., 2017). The KNN algorithm proved capable of forecasting future surface currents up to 24 h in advance. The KNN approach was compared with other prediction techniques such as ARIMA, exponential smoothing, and LSTM, and it was found that the KNN model had the highest accuracy. KNN was one of the six ML classifiers used to generate precise geographic estimates of seabed substrate and seabed habitat mapping (Diesing and Stephens, 2015; Leon et al., 2020). The accuracy of the predictions was evaluated using ground-truth sample data segmented into classes of seabed substrate. In coastal hazards projection, KNN was used to project dangers using several representative concentration route climate change scenarios, regional climate models, and sea level rise ratios (Park and Lee, 2020). Seafloor classification is another marine-related project where KNN was used along with ANN to class the structure of the seafloor and to pinpoint potential anthropogenic effects on delicate benthic assemblages (Gauci et al., 2016). Finally, in sea-land segmentation, KNN was used to produce a pixel-level, sea-land segmentation of the scene based on the Doppler bandwidth of a returns vector in maritime surveillance radars (Shui et al., 2020).

The Naive Bayes classifier is a machine learning algorithm commonly utilized in various applications to enhance model accuracy. A prominent application of the Naive Bayes classifier involves predicting water quality classes utilizing seven popular Water Quality Index (WQI) models (Uddin et al., 2023). There is some confusion about the proper classification of water quality due to differing techniques used in current WQI models. To address this, the Naive Bayes was compared with other ML classifiers. These included SVM, Random Forest, K-Nearest Neighbor, and Gradient Boosting. The goal was to determine the best classifier for evaluating water quality. Another application of the Naive Bayes classifier is in detecting small-scale assemblages of drifting vegetation and beach cast in Germany’s Baltic coast (Uhl et al., 2022). To obtain the best classification results, the classifier was used as part of an ensemble of five classifiers, including a RF, CART, SVM, and stochastic gradient boosting classifier to predict tropical Cyclone based on multi-model fusion across Indian coastal region (Varalakshmi et al., 2021). In all applications, the Naive Bayes classifier was effective in improving the accuracy of the models, particularly in predicting the quality of coastal water and detecting small-scale assemblages of drifting vegetation and beach cast. Its versatility and usefulness in different domains make it a popular choice for improving the accuracy of models in various applications.

Given coastaline extraction from satellite images, three well known methods including image processing techniques, unsupervised classifiers and supervised classifiers have been implemented. Shenbagaraj et al. (2014) employed visual interpretation and ISODATA (Iterative Self-Organizing Data Analysis Technique) classification techniques to extract shorelines from Landsat Thematic Mapper (TM), Enhanced Thematic Mapper Plus (ETM+) sensor images, Toposheet and Google Earth Images spanning a 60-year period from 1953 to 2013 between Kolachel and Kayalpattanam. This approach effectively identified the areas of coastline transgression and regression in the study area. Supervised classifiers such as Maximum Likelihood (Rokni et al., 2015), SVM & ANN & EL (Çelik and Gazioğlu, 2022), RF (Bayram et al., 2017), Minimum-Distance-to-Means, and Mahalanobis distance (Sekovski et al., 2014) also have been employed to classify and detect the coastline position based on the satellite images. As depicted in Figure 9, the average median distance of all shorelines, observed in relation to the reference, suggests that the shorelines produced by the ISODATA and Mahalanobis methods demonstrate the best alignment, with a discrepancy of 2.2 m, thereby being closer to the reference than other methods. Conversely, the Parallelepiped and Maximum Likelihood methods resulted in shorelines with the highest average median distance from the reference shoreline, measuring 5.1 m and 5.6 m respectively.

FIGURE 9

4 Summary and conclusion

This study provides a comprehensive review of machine learning applications to model the marine and coastal environments, with comprehensive coverage from data preprocessing to the application of different models. The review indicates that appropriately implemented and optimized ML methods can significantly contribute to marine and coastal sustainability through developing accurate and robust models for prediction of wave height, oceanographic parameters, and sediment transport, image processing, optimization of coastal and marine structures design.

Here are some insights based on your review:

  • 1. Dependence on data quality: the study concludes by reminding us that the efficacy of ML models heavily relies on factors such as the quality of datasets, the type and configuration of the ML model, and tuning parameters. It reemphasizes the importance of sound data science practices in applying ML.

  • 2. Exploitation of data: this paper underlines the importance of data preprocessing, including data cleaning, dimensionality reduction, and normalization in machine learning models. This emphasizes the pivotal role of quality data in the effectiveness of ML applications in modelling phenomena such as wave patterns, coastal erosion, and sediment transport in marine and coastal environments.

  • 3. Diverse machine learning approaches: the current paper is examined three primary types of ML including supervised, unsupervised and reinforcement learning, and their respective applications in marine and coastal science. Supervised learning, using algorithms such as decision trees and neural networks, leverages labeled data to predict parameters like wave height and wind speed, and make morphodynamic predictions. Unsupervised learning, on the other hand, independently discovers patterns and relationships in data for tasks like clustering and anomaly detection, and has been employed to classify wind values and examine beach profiles. Reinforcement learning, operating on a reward or penalty system, plays a vital role in devising control policies and planning for future scenarios in areas like flood risk reduction and wave energy conversion. Various ML methods such as PCR, LM, RT, and ANN are instrumental in facilitating these applications.

  • 4. Classification algorithms: classification algorithms such as Kernel- and Tree-based models play crucial roles in environmental data interpretation and decision-making. SVM is known for its binary classification capabilities, while RT and DT provide swift classification and a better understanding of input variables. RF offers robustness against overfitting and efficiently manages multicollinearity. The KNN classifier performs well in comparing unclassified samples with training data. Naive Bayes, using Bayes’ theorem, efficiently processes and analyzes high-dimensional data and is often used in predicting water quality and tropical cyclone trajectories.

  • 5. Application of ML: from forecasting oceanographic and morphologic parameters to estimating longshore sediment transport rates, the use of ML significantly enhances the capacity for prediction and understanding of marine and coastal environments. ANNs and SVR are frequently used for wave height predictions. Their accuracy and reliability help in crucial areas such as managing coastal erosion and development. The prediction of SST using ML, specifically LSTM neural networks, has shown great promise. Accurate SST prediction can contribute significantly to climate modeling, weather forecasting and the preservation of marine ecosystems. ANFIS has shown accuracy and reliability in estimating longshore sediment transport rates, which is essential for managing coastal erosion and development.

  • 6. The growing role of new techniques: the rising prominence of deep neural networks, convolutional neural networks, and random forests is indicative of the evolution of the field, and the increasing complexity of the problems being addressed. These advanced techniques often deliver superior performance and can manage more complex and high-dimensional datasets. Integrated algorithms such as ICEEMDAN-ELM exhibit superior performance. The adoption of ML algorithms has seen a significant increase in recent years.

4.1 Recommendations for future research endeavours

  • • Developing hybrid models: the employment of combined and hybrid models has exhibited significant success, notably in addressing multifaceted issues. Eslaminezhad et al. (2022) advanced the efficiency of tree-structured machine learning models in determining the crucial parameters for forecasting flood susceptibility and constructing flood susceptibility maps, through the incorporation of the BPSO algorithm.

  • • Developoing physical-based machine learning: it is apparent that machine learning models do not adequately consider the actual physical elements of the problem. Consequently, the prospect of integrating physical-based machine learning approaches is recommended for further contemplation.

  • • Implementing domain adaptation techniques: to address the regional restrictions inherent in existing models, it might be prudent to consider the application of domain adaptation techniques.

  • • Evaluating models’ uncertainty: it is essential to acknowledge that inherent uncertainty is a fundamental aspect of any model. Thus, it is proposed that the models’ uncertainty be consistently documented, and appropriate methodologies be utilized to alleviate it.

  • • Development of appropriate scaling techniques: by developing appropriate scaling techniques, one ensures that all features contribute equally to the final prediction, thereby improving the performance of the machine learning model.

Statements

Author contributions

AP: Supervision, Compilation and Integration of Data, Data Curation, Software, Validation, Visualization, Writing—Review and Editing. MJ: Literature Search, Information Provision, Writing—Review and Editing. MB: Supervision, Writing—Review and Editing, Funding Acquisition, Project Administration. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Glossary

Abbreviationsymbol Definition
AIArtificial Intelligence
ANFISAdaptive Network Based Fuzzy Inference Systems
ANNArtificial Neural Network
ANN-MLPArtificial Neural Network (Multilayer Perceptron)
ANN-RBFArtificial Neural Network (Radial Basis Function)
ANOVAAnalysis of Variance
BMABayesian Model Averaging method
CARTClassification And Regression Trees
CERCCoastal Engineering Research Center
CNNConvolutional Neural Networks
DADimensional Analysis
DBSCANDensity-Based Spatial Clustering of Applications with Noise
DEMDigital Elevation Model
DFADynamic Factor Analysis
DNNDeep Neural Networks
DRLDeep Reinforcement Learning
DTDecision Tree
ELMExtreme Learning Machine
ELMORExtreme Learning Machine for Ordinal Regression
Ens-ELMEnsemble of Extreme Learning Machine
FISFuzzy Inference System
FLFuzzy Logic
FUNWAVEFully Nonlinear Boussinesq Wave model
GAP-RBFGrowing and Pruning Radial Basis Function
GBMGeneralized Boosted Regression
GCMGlobal Climate Model
GGA-ELMGrouping Genetic Algorithm—Extreme Learning Machine approach
GISGeographic Information Systems
GPGenetic Programming
GPRGaussian Process Regression
GRNNGeneral Regression Neural Networks
HBPHierarchical Bayesian Model
ICAIndependent Component Analysis
ICEEMDAN-ELMImproved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise-Extreme Learning Machine
ISODATAIterative Self-Organizing Data Analysis Technique
KDLORKernel Discriminant Learning for Ordinal Regression
KNNK-Nearest Neighbors
LASSOLeast Absolute Shrinkage and Selection Operator
LiDARLight Detection and Ranging
LMLinear Model
LSTMLong-Short-Term Memory model
LSTM neural network:Long Short-Term Memory neural network
LSTRLongshore Sediment Transport Rates
MAFAMin/Max Autocorrelation Factor Analysis
MDPMarkov Decision Process
MLMachine Learning
MLRMultiple Linear Regression
MRANMinimal Resource Allocation Network
NLPNatural Language Processing
NSWENonlinear Shallow Water Equations
ONNOrdinal Neural Networks
OS-ELMOnline Sequential Extreme Learning Machine
PCAPrincipal Component Analysis
POMProportional Odds Model
Q-Learning:A model-free reinforcement learning algorithm
RCMRegional Climate Model
RFRandom Forest
RLReinforcement Learning
RNNRecurrent Neural Network
RTRegression Tree
SLFNSingle Layer Feedforward Neural Network
SSTSea Surface Temperature
SVMSupport Vector Machine
SVM (polynomial)Support Vector Machine (Polynomial)
SVM-RBFSupport Vector Machine (Radial Basis Function)
SVOR-EXSupport Vector Ordinal Regression with Explicit constraints
SVOR-IMSupport Vector Ordinal Regression with Implicit constraints
SVRSupport Vector Regression
SWANSimulating WAves Nearshore model
TOLTolerance
VIFVariance Inflation Factor
VRVan Ridge formula
WBWalton-Bruno formula
XVector of original data
μMean
σStandard Deviation
Individual data point in X
Normalized data
Minimum value of X
Maximum value of X

References

  • 1

    AbolfathiS.Yeganeh-BakhtiaryA.Hamze-ZiabariS. M.BorzooeiS. (2016). Wave runup prediction using M5′ model tree algorithm. Ocean. Eng.112, 7681. 10.1016/J.OCEANENG.2015.12.016

  • 2

    AdamE.MutangaO.OdindiJ.Abdel-RahmanE. M. (2014). Land-use/cover classification in a heterogeneous coastal landscape using RapidEye imagery: evaluating the performance of random forest and support vector machines classifiers. Int. J. Remote Sens.35, 34403458. 10.1080/01431161.2014.903435

  • 3

    AfsarianF.SaberA.PourzangbarA.OlabiA. G.KhanmohammadiM. A. (2018). Analysis of recycled aggregates effect on energy conservation using M5″ model tree algorithm. Energy156, 264277. 10.1016/j.energy.2018.05.099

  • 4

    AgarwalP.ManuelL. (2008). Extreme loads for an offshore wind turbine using statistical extrapolation from limited field data. Wind Energy11, 673684. 10.1002/we.301

  • 5

    AgrafiotisP.SkarlatosD.GeorgopoulosA.KarantzalosK. (2019). DepthLearn: learning to correct the refraction on point clouds derived from aerial imagery for accurate dense shallow water bathymetry based on SVMs-fusion with LiDAR point clouds. Remote Sens.11, 2225. 10.3390/rs11192225

  • 6

    AkbarifardS.RadmaneshF. (2018). Predicting sea wave height using Symbiotic Organisms Search (SOS) algorithm. Ocean. Eng.167, 348356. 10.1016/J.OCEANENG.2018.04.092

  • 7

    AliM.PrasadR. (2019). Significant wave height forecasting via an extreme learning machine model integrated with improved complete ensemble empirical mode decomposition. Renew. Sustain. Energy Rev.104, 281295. 10.1016/J.RSER.2019.01.014

  • 8

    AliM.PrasadR.XiangY.JameiM.YaseenZ. M. (2023). Ensemble robust local mean decomposition integrated with random forest for short-term significant wave height forecasting. Renew. Energy205, 731746. 10.1016/J.RENENE.2023.01.108

  • 9

    AltmanN. S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. Am. Statistician46, 175185. 10.1080/00031305.1992.10475879

  • 10

    AnderliniE.HusainS.ParkerG. G.AbusaraM.ThomasG. (2020). Towards real-time reinforcement learning control of a wave energy converter. J. Mar. Sci. Eng.8, 845. 10.3390/jmse8110845

  • 11

    ArslanO.AkyürekÖ.KayaŞ.ŞekerD. Z. (2020). Dimension reduction methods applied to coastline extraction on hyperspectral imagery. Geocarto Int.35, 376390. 10.1080/10106049.2018.1520920

  • 12

    Ba-AlawiA. H.VilelaP.Loy-BenitezJ.HeoS. K.YooC. K. (2021). Intelligent sensor validation for sustainable influent quality monitoring in wastewater treatment plants using stacked denoising autoencoders. J. Water Process Eng.43, 102206. 10.1016/j.jwpe.2021.102206

  • 13

    BabooS. S.TajudinK. (2013). “Clustering centroid finding algorithm (CCFA) using spatial temporal data mining concept,” in 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering 2013, Salem, India, February 21–22, 2013 (IEEE (Institute of Electrical and Electronics Engineers), 3036. 10.1109/ICPRIME.2013.6496443

  • 14

    BaiY.CaiW. J.HeX.ZhaiW.PanD.DaiM.et al (2015). A mechanistic semi-analytical method for remotely sensing Sea Surface pCO2 in river-dominated coastal oceans: a case study from the east China sea. J. Geophys. Res. Oceans120, 23312349. 10.1002/2014JC010632

  • 15

    BakhtyarR.GhaheriA.Yeganeh-BakhtiaryA.BaldockT. E. (2008). Longshore sediment transport estimation using a fuzzy inference system. Appl. Ocean Res.30, 273286. 10.1016/J.APOR.2008.12.001

  • 16

    BalogunA. L.AdebisiN. (2021). Sea level prediction using ARIMA, SVR and LSTM neural network: assessing the impact of ensemble ocean-atmospheric processes on models’ accuracy. Geomatics, Nat. Hazards Risk12, 653674. 10.1080/19475705.2021.1887372

  • 17

    BateniS. M.BorgheiS. M.JengD. S. (2007). Neural network and neuro-fuzzy assessments for scour depth around bridge piers. Eng. Appl. Artif. Intell.20, 401414. 10.1016/j.engappai.2006.06.012

  • 18

    BayramB.ErdemF.AkpinarB.InceA. K.BozkurtS.Catal ReisH.et al (2017). The efficiency of random forest method for shoreline extraction from landsat-8 and gokturk-2 imageries. ISPRS Ann. Photogrammetry, Remote Sens. Spatial Inf. Sci., 141145. 10.5194/isprs-annals-IV-4-W4-141-2017

  • 19

    BelgiuM.DrăguţL. (2016). Random forest in remote sensing: a review of applications and future directions. ISPRS J. Photogrammetry Remote Sens.114, 2431. 10.1016/J.ISPRSJPRS.2016.01.011

  • 20

    BerbićJ.OcvirkE.CarevićD.LončarG. (2017). Application of neural networks and support vector machine for significant wave height prediction. Oceanologia59, 331349. 10.1016/J.OCEANO.2017.03.007

  • 21

    BeuzenT.GoldsteinE. B.SplinterK. D. (2019). Ensemble models from machine learning: an example of wave runup and coastal dune erosion. Nat. Hazards Earth Syst. Sci.19, 22952309. 10.5194/nhess-19-2295-2019

  • 22

    BishopC. M.NasrabadiN. M. (2006). Pattern recognition and machine learning. Springer.

  • 23

    BooijN.HolthuijsenL. H.RisR. C. (1997). The swan wave model for shallow water. Coast. Eng. 1996. 10.1061/9780784402429.053

  • 24

    BowesB. D.TavakoliA.WangC.HeydarianA.BehlM.BelingP. A.et al (2021). Flood mitigation in coastal urban catchments using real-time stormwater infrastructure control and reinforcement learning. J. Hydroinformatics23, 529547. 10.2166/HYDRO.2020.080

  • 25

    BowesB. D.WangC.ErcanM. B.CulverT. B.BelingP. A.GoodallJ. L. (2022). Reinforcement learning-based real-time control of coastal urban stormwater systems to mitigate flooding and improve water quality. Environ. Sci. Water Res. Technol.8, 20652086. 10.1039/d1ew00582k

  • 26

    BreimanL. (2001). Random forests. Mach. Learn.45, 532. 10.1023/A:1010933404324

  • 27

    BrownJ. M.YellandM. J.PullenT.SilvaE.MartinA.GoldI.et al (2021). Novel use of social media to assess and improve coastal flood forecasts and hazard alerts. Sci. Rep.11, 13727. 10.1038/s41598-021-93077-z

  • 28

    ÇelikO. İ.GazioğluC. (2022). Coast type based accuracy assessment for coastline extraction from satellite image with machine learning classifiers. Egypt. J. Remote Sens. Space Sci.25, 289299. 10.1016/J.EJRS.2022.01.010

  • 29

    ChandolaV.BanerjeeA.KumarV. (2009). Anomaly detection: a survey. ACM Comput. Surv.41, 158. 10.1145/1541880.1541882

  • 30

    ChenJ.AbbadyS.DuggimpudiM. B. (2016). Spatiotemporal outlier detection: did buoys tell where the hurricanes were?Pap. Appl. Geogr.2, 298314. 10.1080/23754931.2016.1149874

  • 31

    ChenW.MaW. (2010). “Applications based on genetic neural network model of Lianyungang marine water quality optimization techniques and algorithms Technology”. 2010 International Conference of Information Science and Management Engineering 20101, 526529. 10.1109/ISME.2010.253

  • 32

    ChoH. Y.OhJ. H.KimK. O.ShimJ. S. (2013). Outlier detection and missing data filling methods for coastal water temperature data. J. Coast. Res.165, 18981903. 10.2112/si65-321.1

  • 33

    ChoiH. M.KimM. K.YangH. (2023). Deep-learning model for sea surface temperature prediction near the Korean Peninsula. Deep Sea Res. Part II Top. Stud. Oceanogr.208, 105262. 10.1016/J.DSR2.2023.105262

  • 34

    CiortanS.RusuE. (2018). Prediction of the wave power in the Black Sea based on wind speed using artificial neural networks. E3S Web Conf.51, 01006. 10.1051/e3scconf/20185101006

  • 35

    Copernicus (2023). The copernicus marine service. Available at: https://marine.copernicus.eu/about.

  • 36

    Cornejo-BuenoL.Nieto-BorgeJ. C.García-DíazP.RodríguezG.Salcedo-SanzS. (2016). Significant wave height and energy flux prediction for marine energy applications: a grouping genetic algorithm – extreme learning machine approach. Renew. Energy97, 380389. 10.1016/J.RENENE.2016.05.094

  • 37

    CortesC.VapnikV. (1995). Support-vector networks. Mach. Learn.20, 273297. 10.1023/A:1022627411411

  • 38

    CuadraL.Salcedo-SanzS.Nieto-BorgeJ. C.AlexandreE.RodríguezG. (2016). Computational intelligence in wave energy: comprehensive review and case study. Renew. Sustain. Energy Rev.58, 12231246. 10.1016/j.rser.2015.12.253

  • 39

    DarandaA.DzemydaG. (2020). Navigation decision support: discover of vessel traffic anomaly according to the historic marine data. Int. J. Comput. Commun. CONTROL15. 10.15837/IJCCC.2020.3.3864

  • 40

    DavidsonM. A.BirdP. A. D.BullockG. N.HuntleyD. A. (1996). A new non-dimensional number for the analysis of wave reflection from rubble mound breakwaters. Coast. Eng.28, 93120. 10.1016/0378-3839(96)00012-9

  • 41

    DemetriouD.MichailidesC.PapanastasiouG.OnoufriouT. (2021). Coastal zone significant wave height prediction by supervised machine learning classification algorithms. Ocean. Eng.221, 108592. 10.1016/j.oceaneng.2021.108592

  • 42

    den BiemanJ. P.WilmsJ. M.van den BoogaardH. F. P.van GentM. R. A. (2020). Prediction of mean wave overtopping discharge using gradient boosting decision trees. Water12. 10.3390/W12061703

  • 43

    DengY.YangJ.ZhaoW.LiX.XiaoL. (2016). Freak wave forces on a vertical cylinder. Coast. Eng.114, 918. 10.1016/j.coastaleng.2016.03.007

  • 44

    DeroliyaP.GhoshM.MohantyM. P.GhoshS.RaoK. D.KarmakarS. (2022). A novel flood risk mapping approach with machine learning considering geomorphic and socio-economic vulnerability dimensions. Sci. Total Environ.851, 158002. 10.1016/j.scitotenv.2022.158002

  • 45

    DezvarehR.ShafaghatM. (2020). Predicting the sediment rate of Nakhilo Port using artificial intelligence. Int. J. Coast. offshore Eng.4 (2), 4149. 10.22034/IJCOE.2020.149345

  • 46

    DiZ.ChangM.GuoP.LiY.ChangY. (2019). Using real-time data and unsupervised machine learning techniques to study large-scale spatio-temporal characteristics of wastewater discharges and their influence on surface water quality in the Yangtze River Basin. WaterSwitzerl.11, 1268. 10.3390/w11061268

  • 47

    DiesingM.StephensD. (2015). A multi-model ensemble approach to seabed mapping. J. Sea Res.100, 6269. 10.1016/j.seares.2014.10.013

  • 48

    DoganG.FordM.JamesS. (2021). “Predicting ocean-wave conditions using buoy data supplied to a hybrid RNN-LSTM neural network and machine learning models,” in Proceedings of the 2021 IEEE International Conference on Machine Learning and Applied Network Technologies, Soyapango, El Salvador, December 16–17, 2021 (IEEE (Institute of Electrical and Electronics Engineers)). ICMLANT 2021. 10.1109/ICMLANT53170.2021.9690528

  • 49

    DonnellyJ.AbolfathiS.PearsonJ.ChatrabgounO.DaneshkhahA. (2022). Gaussian process emulation of spatio-temporal outputs of a 2D inland flood model. Water Res.225, 119100. 10.1016/j.watres.2022.119100

  • 50

    DuongN. T.TranK. Q.LuuL. X.TranL. H. (2023). Prediction of breaking wave height by using artificial neural network-based approach. Ocean. Model.182, 102177. 10.1016/J.OCEMOD.2023.102177

  • 51

    El-HaddadB. A.YoussefA. M.PourghasemiH. R.PradhanB.El-ShaterA. H.El-KhashabM. H. (2021). Flood susceptibility prediction using four machine learning techniques and comparison of their performance at Wadi Qena Basin, Egypt. Nat. Hazards105, 83114. 10.1007/s11069-020-04296-y

  • 52

    El-RahmanS. A. (2016). “Hyperspectral imaging classification using ISODATA algorithm: big data challenge”. in Proceedings - 2015 5th International Conference on e-Learning, Manama, Bahrain, October 18–20, 2015 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/ECONF.2015.39

  • 53

    ElsayedS.IbrahimH.HusseinH.ElsherbinyO.ElmetwalliA. H.MoghanmF. S.et al (2021). Assessment of water quality in lake qaroun using ground-based remote sensing data and artificial neural networks. Water13, 3094. 10.3390/w13213094

  • 54

    EmmanouilS.AguilarS. G.NaneG. F.SchoutenJ. J. (2020). Statistical models for improving significant wave height predictions in offshore operations. Ocean. Eng.206, 107249. 10.1016/j.oceaneng.2020.107249

  • 55

    EnnoualiZ.FannassiY.LahssiniG.BenmohammadiA.MasriaA. (2023). Mapping coastal vulnerability using machine learning algorithms: a case study at north coastline of sebou estuary, Morocco. Regional Stud. Mar. Sci.60, 102829. 10.1016/J.RSMA.2023.102829

  • 56

    EsterM.KriegelH.-P.SanderJ.XuX. (1996). “A density-based algorithm for discovering clusters in large spatial databases with Noise,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.

  • 57

    EwuzieU.AkuN. O.NwankpaS. U. (2021). An appraisal of data collection, analysis, and reporting adopted for water quality assessment: a case of Nigeria water quality research. Heliyon7, e07950. 10.1016/J.HELIYON.2021.E07950

  • 58

    EyringV.BonyS.MeehlG. A.SeniorC. A.StevensB.StoufferR. J.et al (2016). Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization. Geosci. Model. Dev.9, 19371958. 10.5194/gmd-9-1937-2016

  • 59

    FanS.XiaoN.DongS. (2020). A novel model to predict significant wave height based on long short-term memory network. Ocean. Eng.205, 107298. 10.1016/J.OCEANENG.2020.107298

  • 60

    FernándezJ. C.Salcedo-SanzS.GutiérrezP. A.AlexandreE.Hervás-MartínezC. (2015). Significant wave height and energy flux range forecast with machine learning classifiers. Eng. Appl. Artif. Intell.43, 4453. 10.1016/J.ENGAPPAI.2015.03.012

  • 61

    FormentinS. M.ZanuttighB.Van Der MeerJ. W. (2017). A neural network tool for predicting wave reflection, overtopping and transmission. Coast. Eng. J.59, 1750006-11750006-31. 10.1142/S0578563417500061

  • 62

    FreemanB.TangY.HuangY.VanZwietenJ. (2021). Rotor blade imbalance fault detection for variable-speed marine current turbines via generator power signal analysis. Ocean. Eng.223, 108666. 10.1016/j.oceaneng.2021.108666

  • 63

    GandomiM.Dolatshahi PiroozM.VarjavandI.NikooM. R. (2020). Permeable breakwaters performance modeling: a comparative study of machine learning techniques. Remote Sens.12, 1856. 10.3390/rs12111856

  • 64

    GauciA.DeidunA.AbelaJ.Zarb AdamiK. (2016). Machine Learning for benthic sand and maerl classification and coverage estimation in coastal areas around the Maltese Islands. J. Appl. Res. Technol.14, 338344. 10.1016/j.jart.2016.08.003

  • 65

    GawehnM.van DongerenA.de VriesS.SwinkelsC.HoekstraR.AarninkhofS.et al (2020). The application of a radar-based depth inversion method to monitor near-shore nourishments on an open sandy coast and an ebb-tidal delta. Coast. Eng.159, 103716. 10.1016/J.COASTALENG.2020.103716

  • 66

    GoldsteinE. B.CocoG.PlantN. G. (2019). A review of machine learning applications to coastal sediment transport and morphodynamics. Earth-Science Rev.194, 97108. 10.1016/j.earscirev.2019.04.022

  • 67

    GomezC. (2022). “Point-cloud technology for coastal and floodplain geomorphology,” in Point cloud technologies for geomorphologists from data acquisition to processing (Springer), 5381. 10.1007/978-3-031-10975-1

  • 68

    GünaydinK. (2008). The estimation of monthly mean significant wave heights by using artificial neural network and regression methods. Ocean. Eng.35, 14061415. 10.1016/J.OCEANENG.2008.07.008

  • 69

    HagenaarsG.de VriesS.LuijendijkA. P.de BoerW. P.ReniersA. J. H. M. (2018). On the accuracy of automated shoreline detection derived from satellite imagery: a case study of the sand motor mega-scale nourishment. Coast. Eng.133, 113125. 10.1016/J.COASTALENG.2017.12.011

  • 70

    HallJ. W.MeadowcroftI. C.LeeE. M.Van GelderP. H. A. J. M. (2002). Stochastic simulation of episodic soft coastal cliff recession. Coast. Eng.46, 159174. 10.1016/S0378-3839(02)00089-3

  • 71

    HashemiM. R.GhadampourZ.NeillS. P. (2010). Using an artificial neural network to model seasonal changes in beach profiles. Ocean. Eng.37, 13451356. 10.1016/J.OCEANENG.2010.07.004

  • 72

    HessamiM.GachonP.OuardaT. B. M. J.St-HilaireA. (2008). Automated regression-based statistical downscaling tool. Environ. Model. Softw.23, 813834. 10.1016/J.ENVSOFT.2007.10.004

  • 73

    HeumannB. W. (2011). An object-based classification of mangroves using a hybrid decision tree-support vector machine approach. Remote Sens.3, 24402460. 10.3390/rs3112440

  • 74

    HodgeV. J.AustinJ. (2004). A survey of outlier detection methodologies. Artif. Intell. Rev.22, 85126. 10.1023/b:aire.0000045502.10941.a9

  • 75

    HongJ.-H.ChiewY.-M.ChengN.-S. (2013). Scour caused by a propeller jet. J. Hydraul. Eng.139, 10031012. 10.1061/(asce)hy.1943-7900.0000746

  • 76

    HoonhoutB. M.RadermacherM.BaartF.van der MaatenL. J. P. (2015). An automated method for semantic classification of regions in coastal images. Coast. Eng.105, 112. 10.1016/j.coastaleng.2015.07.010

  • 77

    HosseinzadehS.Etemad-ShahidiA.KooshehA. (2021). Prediction of mean wave overtopping at simple sloped breakwaters using kernel-based methods. J. Hydroinformatics23, 10301049. 10.2166/hydro.2021.046

  • 78

    HotellingH. (1933). Analysis of a complex of statistical variables into principal components. J. Educ. Psychol.24, 417441. 10.1037/h0071325

  • 79

    HuaX. G.NiY. Q.KoJ. M.WongK. Y. (2007). Modeling of temperature–frequency correlation using combined principal component analysis and support vector regression technique. J. Comput. Civ. Eng.21, 122135. 10.1061/(asce)0887-3801(2007)21:2(122)

  • 80

    HuangD.ZhaoD.WeiL.WangZ.DuY. (2015). Modeling and analysis in marine big data: advances and challenges. Math. Problems Eng.2015, 113. 10.1155/2015/384742

  • 81

    IrishJ. L.WhiteT. E. (1998). Coastal engineering applications of high-resolution lidar bathymetry. Coast. Eng.35, 4771. 10.1016/S0378-3839(98)00022-2

  • 82

    IzadiM.SultanM.KadiriR. E.GhannadiA.AbdelmohsenK. (2021). A remote sensing and machine learning-based approach to forecast the onset of harmful algal bloom. Remote Sens.13, 3863. 10.3390/rs13193863

  • 83

    JainP.DeoM. C. (2008). Artificial intelligence tools to forecast ocean waves in real time. Open Ocean Eng. J.1, 1320. 10.2174/1874835x00801010013

  • 84

    JamesS. C.ZhangY.O’DonnchaF. (2018). A machine learning framework to forecast wave conditions. Coast. Eng.137, 110. 10.1016/j.coastaleng.2018.03.004

  • 85

    JayaratneM. P. R.PremaratneB.AdewaleA.MikamiT.MatsubaS.ShibayamaT.et al (2016). Failure mechanisms and local scour at coastal structures induced by Tsunami. Coast. Eng. J.58, 1640017-11640017-38. 10.1142/S0578563416400179

  • 86

    JiangT.GradusJ. L.RoselliniA. J. (2020). Supervised machine learning: a brief primer. Behav. Ther.51, 675687. 10.1016/J.BETH.2020.05.002

  • 87

    JirakittayakornA.KormongkolkulT.VateekulP.JitkajornwanichK.LawawirojwongS. (2017). “Temporal kNN for short-Term ocean current prediction based on HF radar observations,” in Proceedings of the 2017 14th International Joint Conference on Computer Science and Software Engineering, NakhonSiThammarat, Thailand, July 12–14, 2017 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/JCSSE.2017.8025921

  • 88

    JoyceK. E.FickasK. C.KalamandeenM. (2023). The unique value proposition for using drones to map coastal ecosystems. Camb. Prisms Coast. Futur.1, e6. 10.1017/cft.2022.7

  • 89

    Kabiri-SamaniA. R.Aghaee-TarazjaniJ.BorgheiS. M.JengD. S. (2011). Application of neural networks and fuzzy logic models to long-shore sediment transport. Appl. Soft Comput.11, 28802887. 10.1016/J.ASOC.2010.11.021

  • 90

    KagemotoH. (2020). Forecasting a water-surface wave train with artificial intelligence- A case study. Ocean. Eng.207, 107380. 10.1016/J.OCEANENG.2020.107380

  • 91

    KalkanK.BayramB.MaktavD.SunarF. (2013). “Comparison of support vector machine and object based classification methods for coastline detection,” in International archives of the photogrammetry, remote sensing and spatial information sciences - ISPRS archives. 10.5194/isprsarchives-XL-7-W2-125-2013

  • 92

    KaloopM. R.KumarD.ZarzouraF.RoyB.HuJ. W. (2020). A wavelet - particle swarm optimization - extreme learning machine hybrid modeling for significant wave height prediction. Ocean. Eng.213, 107777. 10.1016/J.OCEANENG.2020.107777

  • 93

    KaplanD.Muñoz-CarpenaR.RitterA. (2010). Untangling complex shallow groundwater dynamics in the floodplain wetlands of a southeastern U.S. coastal river. Water Resour. Res.46. 10.1029/2009WR009038

  • 94

    KarimpourA.ChenQ.TwilleyR. R. (2016). A field study of how wind waves and currents may contribute to the deterioration of saltmarsh fringe. Estuaries Coasts39, 935950. 10.1007/s12237-015-0047-z

  • 95

    KartalS. (2023). Assessment of the spatiotemporal prediction capabilities of machine learning algorithms on Sea Surface temperature data: a comprehensive study. Eng. Appl. Artif. Intell.118, 105675. 10.1016/J.ENGAPPAI.2022.105675

  • 96

    KelleherJ. D.NameeB. M.D’ArcyA. (2015). Fundamentals of machine learning for predictive data analytics: algorithms, worked examples, and case studies.

  • 97

    KimH. D.AokiS. I. (2021). Artificial intelligence application on sediment transport. J. Mar. Sci. Eng.9, 600. 10.3390/jmse9060600

  • 98

    KimJ.KimJ. (2020). Estimation of water surface flow velocity in coastal video imagery by visual tracking with deep learning. J. Coast. Res.95, 522. 10.2112/SI95-101.1

  • 99

    KimJ.KimJ.KimT.HuhD.CairesS. (2020). Wave-tracking in the surf zone using coastal video imagery with deep neural networks. Atmos. (Basel)11, 304. 10.3390/atmos11030304

  • 100

    KimT.KwonY.LeeJ.LeeE.KwonS. (2022). Wave attenuation prediction of artificial coral reef using machine-learning integrated with hydraulic experiment. Ocean. Eng.248, 110324. 10.1016/J.OCEANENG.2021.110324

  • 101

    KitsikoudisV.SidiropoulosE.HrissanthouV. (2015). Assessment of sediment transport approaches for sand-bed rivers by means of machine learning. Hydrological Sci. J.60, 15661586. 10.1080/02626667.2014.909599

  • 102

    KnightP. J.BirdC. O.SinclairA.PlaterA. J. (2020). A low-cost GNSS buoy platform for measuring coastal sea levels. Ocean. Eng.203, 107198. 10.1016/J.OCEANENG.2020.107198

  • 103

    KongX.SunY.SuR.ShiX. (2017). Real-time eutrophication status evaluation of coastal waters using support vector machine with grid search algorithm. Mar. Pollut. Bull.119, 307319. 10.1016/J.MARPOLBUL.2017.04.022

  • 104

    KramerO. (2013). Dimensionality reduction with unsupervised nearest neighbors. Intell. Syst. Ref. Libr.51. 10.1007/978-3-642-38652-7

  • 105

    KroonA.LarsonM.MöllerI.YokokiH.RozynskiG.CoxJ.et al (2008). Statistical analysis of coastal morphological data sets over seasonal to decadal time scales. Coast. Eng.55, 581600. 10.1016/j.coastaleng.2007.11.006

  • 106

    KumarN. K.SavithaR.MamunA. A. (2017). Regional ocean wave height prediction using sequential learning neural networks. Ocean. Eng.129, 605612. 10.1016/J.OCEANENG.2016.10.033

  • 107

    KumarN. K.SavithaR.MamunA. A. (2018). Ocean wave height prediction using ensemble of Extreme Learning Machine. Ocean. Eng.277, 605612. 10.1016/J.NEUCOM.2017.03.092

  • 108

    KuntojiG.RaoM.RaoS. (2020). Prediction of wave transmission over submerged reef of tandem breakwater using PSO-SVM and PSO-ANN techniques. ISH J. Hydraulic Eng.26, 283290. 10.1080/09715010.2018.1482796

  • 109

    KuoY. M.LiuW.ZhaoE.LiR.Muñoz-CarpenaR. (2019). Water quality variability in the middle and down streams of Han River under the influence of the Middle Route of South-North Water diversion project, China. J. Hydrology569, 218229. 10.1016/j.jhydrol.2018.12.001

  • 110

    LatifS. D.ChongK. L.AhmedA. N.HuangY. F.SherifM.El-ShafieA. (2023). Sediment load prediction in johor river: deep learning versus machine learning models. Appl. Water Sci.13, 7913. 10.1007/s13201-023-01874-w

  • 111

    LazuardiW.ArdiyantoR.MarfaiM. A.MutaqinB. W.KusumaD. W. (2021). Coastal reef and seagrass monitoring for coastal ecosystem management. Int. J. Sustain. Dev. Plan.16, 557568. 10.18280/IJSDP.160317

  • 112

    LeeT.-L. (2004). Back-propagation neural network for long-term tidal predictions. Ocean. Eng.31, 225238. 10.1016/S0029-8018(03)00115-X

  • 113

    LiS. P.WangH. L. (2011). “Control stratory in coastal area using Markov chain and Random Forest,” in 2011 IEEE 18th International Conference on Industrial Engineering and Engineering Management, IE and EM 2011, Changchun, China, September 3–5, 2011 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/ICIEEM.2011.6035480

  • 114

    LiuB.YangB.Masoud-AnsariS.WangH.GaheganM. (2021). Coastal image classification and pattern recognition: Tairua beach, New Zealand. Sensors21, 7352. 10.3390/s21217352

  • 115

    LouR.LvZ.DangS.SuT.LiX. (2021). “Application of machine learning in ocean data,” in Multimedia systems. 10.1007/s00530-020-00733-x

  • 116

    MacayealD. R.AbbotD. S.SergienkoO. V. (2011). Iceberg-capsize tsunamigenesis. Ann. Glaciol.52, 5156. 10.3189/172756411797252103

  • 117

    MahjoobiJ.Adeli MosabbebE. (2009). Prediction of significant wave height using regressive support vector machines. Ocean. Eng.36, 339347. 10.1016/J.OCEANENG.2009.01.001

  • 118

    MahjoobiJ.Etemad-ShahidiA. (2008). An alternative approach for the prediction of significant wave heights based on classification and regression trees. Appl. Ocean Res.30, 172177. 10.1016/J.APOR.2008.11.001

  • 119

    MahmoodiK.GhassemiH. (2018). Outlier detection in ocean wave measurements by using unsupervised data mining methods. Pol. Marit. Res.25, 4450. 10.2478/pomr-2018-0005

  • 120

    MakarynskyyO.Pires-SilvaA. A.MakarynskaD.Ventura-SoaresC. (2005). Artificial neural networks in wave predictions at the west coast of Portugal. Comput. Geosciences31, 415424. 10.1016/J.CAGEO.2004.10.005

  • 121

    MartinsG. M.ThompsonR. C.NetoA. I.HawkinsS. J.JenkinsS. R. (2010). Enhancing stocks of the exploited limpet Patella candei d’Orbigny via modifications in coastal engineering. Biol. Conserv.143, 203211. 10.1016/j.biocon.2009.10.004

  • 122

    MasmoudiO.JaouaM.JaouaA.YacoutS. (2021). Data preparation in machine learning for condition-based maintenance. J. Comput. Sci.17, 525538. 10.3844/JCSSP.2021.525.538

  • 123

    MengF.SongT.XuD.XieP.LiY. (2021). Forecasting tropical cyclones wave height using bidirectional gated recurrent unit. Ocean. Eng.234, 108795. 10.1016/J.OCEANENG.2021.108795

  • 124

    MészárosL.van der MeulenF.JongbloedG.El SerafyG. (2022). Coastal environmental and atmospheric data reduction in the Southern North Sea supporting ecological impact studies. Front. Mar. Sci.9, 123. 10.3389/fmars.2022.920616

  • 125

    MillerJ. K.DeanR. G. (2007). Shoreline variability via empirical orthogonal function analysis: part II relationship to nearshore conditions. Coast. Eng.54, 133150. 10.1016/j.coastaleng.2006.08.014

  • 126

    MoncadaA. M.MelesseA. M.VithanageJ.PriceR. M. (2021). Long-term assessment of surface water quality in a highly managed estuary basin. Int. J. Environ. Res. Public Health18, 9417. 10.3390/ijerph18179417

  • 127

    MoodyD. I.BrumbyS. P.RowlandJ. C.AltmannG. L. (2014). Land cover classification in multispectral imagery using clustering of sparse approximations over learned feature dictionaries. J. Appl. Remote Sens.8, 084793. 10.1117/1.jrs.8.084793

  • 128

    NajafiM. R.MoradkhaniH.WherryS. A. (2011). Statistical downscaling of precipitation using machine learning with optimal predictor selection. J. Hydrol. Eng.16, 650664. 10.1061/(asce)he.1943-5584.0000355

  • 129

    NakamuraT.KuramitsuY.MizutaniN. (2008). Tsunami scour around a square structure. Coast. Eng. J.50, 209246. 10.1142/S057856340800179X

  • 130

    NeshatM.AbbasnejadE.ShiQ.AlexanderB.WagnerM. (2019). “Adaptive neuro-surrogate-based optimisation method for wave energy converters placement optimisation,” in Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics. 10.1007/978-3-030-36711-4_30

  • 131

    NeumannB.OttK.KenchingtonR. (2017). Strong sustainability in coastal areas: a conceptual interpretation of SDG 14. Sustain. Sci.12, 10191035. 10.1007/s11625-017-0472-y

  • 132

    NikooM. R.KerachianR.AlizadehM. R. (2018). A fuzzy KNN-based model for significant wave height prediction in large lakes. Oceanologia60, 153168. 10.1016/J.OCEANO.2017.09.003

  • 133

    OehmckeS.ZielinskiO.KramerO. (2015). “Event detection in marine time series data,” in Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics. 10.1007/978-3-319-24489-1_24

  • 134

    PalM.MatherP. M. (2003). An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens. Environ.86, 554565. 10.1016/S0034-4257(03)00132-9

  • 135

    ParadinasL. M.JamesN. A.QuinnB.DaleA.NarayanaswamyB. E. (2021). A new collection tool-kit to sample microplastics from the marine environment (sediment, seawater, and biota) using citizen science. Front. Mar. Sci.8. 10.3389/fmars.2021.657709

  • 136

    ParkJ.OhJ. (2022). Analysis of collected data and establishment of an abnormal data detection algorithm using principal component analysis and K-nearest neighbors for predictive maintenance of ship propulsion engine. Processes10, 2392. 10.3390/pr10112392

  • 137

    ParkS. J.LeeD. K. (2020). Prediction of coastal flooding risk under climate change impacts in South Korea using machine learning algorithms. Environ. Res. Lett.15, 094052. 10.1088/1748-9326/aba5b3

  • 138

    PearsonK. (1901)., 2. London, Edinburgh, 559572. 10.1080/14786440109462720LIII. On lines and planes of closest fit to systems of points in spaceLond. Edinb. Dublin Philosophical Mag. J. Sci.

  • 139

    PeñaE.FerrerasJ.Sanchez-TemblequeF. (2011). Experimental study on wave transmission coefficient, mooring lines and module connector forces with different designs of floating breakwaters. Ocean. Eng.38, 11501160. 10.1016/j.oceaneng.2011.05.005

  • 140

    PereiraG. C.EbeckenN. F. F. (2009). Knowledge discovering for coastal waters classification. Expert Syst. Appl.36, 86048609. 10.1016/J.ESWA.2008.10.009

  • 141

    PimentelM. A. F.CliftonD. A.CliftonL.TarassenkoL. (2014). A review of novelty detection. Signal Process.99, 215249. 10.1016/j.sigpro.2013.12.026

  • 142

    PlaatA.KostersW.PreussM. (2023). High-accuracy model-based reinforcement learning, a survey. Artif. Intell. Rev.56, 95419573. 10.1007/s10462-022-10335-w

  • 143

    PourghasemiH. R.GayenA.ParkS.LeeC. W.LeeS. (2018). Assessment of landslide-prone areas and their zonation using logistic regression, LogitBoost, and naïvebayes machine-learning algorithms. Sustainability10, 3697. 10.3390/su10103697

  • 144

    PourzangbarA.BrocchiniM. (2022). A new process-based, wave-resolving, 2DH circulation model for the evolution of natural sand bars: the role of nearbed dynamics and suspended sediment transport. Coast. Eng.177, 104192. 10.1016/J.COASTALENG.2022.104192

  • 145

    PourzangbarA.BrocchiniM.SaberA.MahjoobiJ.MirzaaghasiM.BarzegarM. (2017a). Prediction of scour depth at breakwaters due to non-breaking waves using machine learning approaches. Appl. Ocean Res.63, 120128. 10.1016/j.apor.2017.01.012

  • 146

    PourzangbarA. (2012). Determination of the most effective parameters on scour depth at seawalls using genetic programming (GP). 10th Int. Conf. coasts, ports Mar. Struct. (ICOPMASS 2012).

  • 147

    PourzangbarA.LosadaM. A.SaberA.AhariL. R.LarroudéP.VaeziM.et al (2017b). Prediction of non-breaking wave induced scour depth at the trunk section of breakwaters using Genetic Programming and Artificial Neural Networks. Coast. Eng.121, 107118. 10.1016/j.coastaleng.2016.12.008

  • 148

    PourzangbarA.SaberA.Yeganeh-BakhtiaryA.AhariL. R. (2017c). Predicting scour depth at seawalls using GP and ANNs. J. Hydroinformatics19, 349363. 10.2166/hydro.2017.125

  • 149

    PrataJ. C.da CostaJ. P.DuarteA. C.Rocha-SantosT. (2019). Methods for sampling and detection of microplastics in water and sediment: a critical review. TrAC Trends Anal. Chem.110, 150159. 10.1016/J.TRAC.2018.10.029

  • 150

    ProvostE. J.ButcherP. A.ColemanM. A.KelaherB. P. (2020). Assessing the viability of small aerial drones to quantify recreational Fishers. Fish. Manag. Ecol.27, 615621. 10.1111/fme.12452

  • 151

    QiaoX.ChuT.TissotP.AliI.AhmedM. (2023). Vertical land motion monitored with satellite radar altimetry and tide gauge along the Texas coastline, USA, between 1993 and 2020. Int. J. Appl. Earth Observation Geoinformation117, 103222. 10.1016/J.JAG.2023.103222

  • 152

    RamaswamyS.RastogiR.ShimK. (2000). Efficient algorithms for mining outliers from large data sets. SIGMOD Rec. (ACM Spec. Interes. Gr. Manag. Data)29, 427438. 10.1145/335191.335437

  • 153

    RanasingheR.LarsonM.SavioliJ. (2010). Shoreline response to a single shore-parallel submerged breakwater. Coast. Eng.57, 10061017. 10.1016/j.coastaleng.2010.06.002

  • 154

    RaoS.MandalS. (2005). Hindcasting of storm waves using neural networks. Ocean. Eng.32, 667684. 10.1016/J.OCEANENG.2004.09.003

  • 155

    ReggianniniM.PapiniO.PieriG. (2022). An automated analysis tool for the classification of Sea surface temperature imagery. Pattern Recognit. Image Anal.32, 631635. 10.1134/S1054661822030336

  • 156

    RengarajanD.VaidyaG.SarveshA.KalathilD.ShakkottaiS. (2022). “Reinforcement learning with sparse rewards using guidance from offline demonstration,” in ICLR 2022 - 10th International Conference on Learning Representations, Virtual, April 25–29, 2022.

  • 157

    RizianizaI.AisjahA. S. (2015). “Prediction of significant wave height in the java sea using artificial neural network,” in Proceeding 2015 International Seminar on Intelligent Technology and Its Applications, ISITIA, Surabaya, Indonesia, May 20–21, 2015 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/ISITIA.2015.7219944

  • 158

    RokniK.AhmadA.SolaimaniK.HaziniS. (2015). A new approach for surface water change detection: integration of pixel level image fusion and image classification techniques. Int. J. Appl. Earth Observation Geoinformation34, 226234. 10.1016/J.JAG.2014.08.014

  • 159

    Ruiz de Alegría-ArzaburuA.Pedrozo-AcuñaA.Horrillo-CaraballoJ. M.MasselinkG.ReeveD. E. (2010). Determination of wave-shoreline dynamics on a macrotidal gravel beach using Canonical Correlation Analysis. Coast. Eng.57, 290303. 10.1016/j.coastaleng.2009.10.014

  • 160

    SajjadM.LinN.ChanJ. C. L. (2020). Spatial heterogeneities of current and future hurricane flood risk along the U.S. Atlantic and Gulf coasts. Sci. Total Environ.713, 136704. 10.1016/j.scitotenv.2020.136704

  • 161

    SakaaB.ElbeltagiA.BoudibiS.ChaffaïH.IslamA. R. M. T.KulimushiL. C.et al (2022). Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin. Environ. Sci. Pollut. Res.29, 4849148508. 10.1007/s11356-022-18644-x

  • 162

    Santamaria CervantesM.Díaz-CarrascoP.MoraguesM. V.ClaveroM.LosadaángelM. (2022). “Uncertainties of the actual engineering formulas for coastal protection slopes. The dimensional analysis and experimental method,” in Proceedings of the 39th IAHR World Congress From Snow to Sea, Granada, Spain, June 19–24, 2022 (IAHR (International Association for Hydro-Environment Engineering and Research)). 10.3850/iahr-39wc252171192022900

  • 163

    SarkarD.ContalE.VayatisN.DiasF. (2016). Prediction and optimization of wave energy converter arrays using a machine learning approach. Renew. Energy97, 504517. 10.1016/j.renene.2016.05.083

  • 164

    SarkarS.GundechaV.GhorbanpourS.ShmakovA.BabuA. R.PichardA.et al (2022). “Skip training for multi-agent reinforcement learning controller for industrial wave energy converters,” in 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), Mexico City, Mexico, August 20–24, 2022 (IEEE (Institute of Electrical and Electronics Engineers)), 212219. 10.1109/CASE49997.2022.9926561

  • 165

    ScottT.MasselinkG.RussellP. (2011). Morphodynamic characteristics and classification of beaches in England and Wales. Mar. Geol.286, 120. 10.1016/j.margeo.2011.04.004

  • 166

    SekovskiI.StecchiF.ManciniF.Del RioL. (2014). Image classification methods applied to shoreline extraction on very high-resolution multispectral imagery. Int. J. Remote Sens.35, 35563578. 10.1080/01431161.2014.907939

  • 167

    ShafaghatM.DezvarehR. (2021). Support vector machine for classification and regression of coastal sediment transport. Arab. J. Geosci.14, 2009. 10.1007/s12517-021-08360-0

  • 168

    ShamshirbandS.MosaviA.RabczukT.NabipourN.ChauK. w. (2020). Prediction of significant wave height; comparison between nested grid numerical model, and machine learning models of artificial neural networks, extreme learning and support vector machines. Eng. Appl. Comput. Fluid Mech.14, 805817. 10.1080/19942060.2020.1773932

  • 169

    ShenbagarajN.ManiN. D.MuthukumarM. (2014). Isodata classification technique to assess the shoreline changes of Kolachel to Kayalpattanam coast. Int. J. Eng. Res. Technol.3. 10.17577/IJERTV3IS040136

  • 170

    ShiF.KirbyJ. T.HarrisJ. C.GeimanJ. D.GrilliS. T. (2012). A high-order adaptive time-stepping TVD solver for Boussinesq modeling of breaking waves and coastal inundation. Ocean. Model.43-44, 3651. 10.1016/J.OCEMOD.2011.12.004

  • 171

    ShuiP. L.XiaX. Y.ZhangY. S. (2020). Sea-land segmentation in maritime surveillance radars via k-nearest neighbor classifier. IEEE Trans. Aerosp. Electron. Syst.56, 38543867. 10.1109/TAES.2020.2981267

  • 172

    ShuvoS. S.YilmazY.BushA.HafenM. (2022). Modeling and simulating adaptation strategies against sea-level rise using multiagent deep reinforcement learning. IEEE Trans. Comput. Soc. Syst.9, 11851196. 10.1109/TCSS.2021.3122282

  • 173

    SierraC.Flor-BlancoG.OrdoñezC.FlorG.GallegoJ. R. (2017). Analyzing coastal environments by means of functional data analysis. Sediment. Geol.357, 99108. 10.1016/j.sedgeo.2017.06.008

  • 174

    SmitM. W. J.AarninkhofS. G. J.WijnbergK. M.GonzálezM.KingstonK. S.SouthgateH. N.et al (2007). The role of video imagery in predicting daily to monthly coastal evolution. Coast. Eng.54, 539553. 10.1016/J.COASTALENG.2007.01.009

  • 175

    SoloyA.TurkiI.LecoqN.Gutiérrez BarcelóÁ. D.CostaS.LaignelB.et al (2021). A fully automated method for monitoring the intertidal topography using Video Monitoring Systems. Coast. Eng.167, 103894. 10.1016/J.COASTALENG.2021.103894

  • 176

    SzmytkiewiczM.BiegowskiJ. X.KaczmarekL. M.OkrójT.OstrowskiR. X.PruszakZ.et al (2000). Coastline changes nearby harbour structures: comparative analysis of one-line models versus field data. Coast. Eng.40, 119139. 10.1016/S0378-3839(00)00008-9

  • 177

    TanJ.ChenS.LeeC. Y.DongG.HuW.WangJ. (2021). Projected changes of typhoon intensity in a regional climate model: development of a machine learning bias correction scheme. Int. J. Climatol.41, 27492764. 10.1002/joc.6987

  • 178

    TanJ.LiuH.LiM.WangJ. (2018). A prediction scheme of tropical cyclone frequency based on lasso and random forest. Theor. Appl. Climatol.133, 973983. 10.1007/s00704-017-2233-3

  • 179

    TayfurG.KarimiY.SinghV. P. (2013). Principle component analysis in conjuction with data driven methods for sediment load prediction. Water Resour. Manag.27, 25412554. 10.1007/s11269-013-0302-7

  • 180

    TimmermansB. W.GommengingerC. P.DodetG.BidlotJ. R. (2020). Global wave height trends and variability from new multimission satellite altimeter products, reanalyses, and wave buoys. Geophys. Res. Lett.47. 10.1029/2019GL086880

  • 181

    TsiakosC. A. D.ChalkiasC. (2023). Use of machine learning and remote sensing techniques for shoreline monitoring: a review of recent literature. Appl. Sci.13, 3268. 10.3390/app13053268

  • 182

    TsujimotoG.TamaiM.YamadaF. (2012). LONG-TERM prediction of beach profile and sediment grain size characteristic at low energy beach. Coast. Eng. Proc.1, 14. 10.9753/icce.v33.sediment.14

  • 183

    TurnerI. L.HarleyM. D.AlmarR.BergsmaE. W. J. (2021). Satellite optical imagery in coastal engineering. Coast. Eng.167, 103919. 10.1016/J.COASTALENG.2021.103919

  • 184

    UddinM. G.NashS.RahmanA.OlbertA. I. (2023). Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Saf. Environ. Prot.169, 808828. 10.1016/J.PSEP.2022.11.073

  • 185

    UhlF.Græsdal RasmussenT.OppeltN. (2021). Classification ensembles for beach cast and drifting vegetation mapping with sentinel-2 and PlanetScope. Geosciences12, 15. 10.3390/geosciences12010015

  • 186

    van GentM. R. A.van den BoogaardH. F. P.PozuetaB.MedinaJ. R. (2007). Neural network modelling of wave overtopping at coastal structures. Coast. Eng.54, 586593. 10.1016/j.coastaleng.2006.12.001

  • 187

    Van KomenD. F.HowarthK.NeilsenT. B.KnoblesD. P.DahlP. H. (2022). A CNN for range and seabed estimation on normalized and extracted time-series impulses. IEEE J. Ocean. Eng.47, 833846. 10.1109/JOE.2021.3134719

  • 188

    VaralakshmiP.VasumathiN.VenkatesanR. (2021). Tropical Cyclone prediction based on multi-model fusion across Indian coastal region. Prog. Oceanogr.193, 102557. 10.1016/j.pocean.2021.102557

  • 189

    VerwegaM. T.TrahmsC.AntiaA. N.DickhausT.PriggeE.PrinzlerM. H. U.et al (2021). Perspectives on marine data science as a blueprint for emerging data science disciplines. Front. Mar. Sci.8. 10.3389/fmars.2021.678404

  • 190

    VosK.SplinterK. D.HarleyM. D.SimmonsJ. A.TurnerI. L. (2019). CoastSat: a Google Earth engine-enabled Python toolkit to extract shorelines from publicly available satellite imagery. Environ. Model. Softw.122, 104528. 10.1016/J.ENVSOFT.2019.104528

  • 191

    WattelezG.DupouyC.JuillotF. (2022). Unsupervised optical classification of the seabed color in shallow oligotrophic waters from sentinel‐2 images: a case study in the voh‐koné‐pouembout lagoon (New Caledonia). Remote Sens.14, 836. 10.3390/rs14040836

  • 192

    WongY. J.ShimizuY.KamiyaA.ManeechotL.BharambeK. P.FongC. S.et al (2021). Application of artificial intelligence methods for monsoonal river classification in Selangor river basin, Malaysia. Environ. Monit. Assess.193, 438. 10.1007/s10661-021-09202-y

  • 193

    XieP.ArkinP. A. (1996). Analyses of global monthly precipitation using gauge observations, satellite estimates, and numerical model predictions. J. Clim.9. 10.1175/1520-0442(1996)009<0840:AOGMPU>2.0.CO;2

  • 194

    XuL.LiQ.YuJ.WangL.XieJ.ShiS. (2020). Spatio-temporal predictions of SST time series in China’s offshore waters using a regional convolution long short-term memory (RC-LSTM) network. Int. J. Remote Sens.41, 33683389. 10.1080/01431161.2019.1701724

  • 195

    XuX.ZhanY.ZhengJ.GengB. (2021). “Classification of coastal altimetric waveforms using machine learning technology,” in 2021 4th International Conference on Information Communication and Signal Processing, ICICSP 2021, Shanghai, China, September 24–26, 2021 (IEEE (Institute of Electrical and Electronics Engineers)). 10.1109/ICICSP54369.2021.9611971

  • 196

    YaoH.YangY.FuX.MiC. (2018). An adaptive sliding-window strategy for outlier detection in wireless sensor networks for smart port construction. J. Coast. Res.82, 245253. 10.2112/SI82-036.1

  • 197

    Yeganeh-BakhtiaryA.EyvazoghliH.ShabakhtyN.KamranzadB.AbolfathiS. (2022). Machine learning as a downscaling approach for prediction of wind characteristics under future climate change scenarios. Complexity2022. 10.1155/2022/8451812

  • 198

    Yeganeh-bakhtiaryA.GhorbaniM. A.PourzangbarA. (2012). Determination of the most important parameters on scour at coastal determination of the most important parameters on scour at coastal structures. J. Civ. Eng. Urbanism2, 6871.

  • 199

    YuL.SunJ.GuoY.ZhangB.YangG.ChenL.et al (2022). Research on outlier detection in CTD conductivity data based on cubic spline fitting. Front. Mar. Sci.9. 10.3389/fmars.2022.1030980

  • 200

    ZanuttighB.FormentinS. M.van der MeerJ. W. (2016). Prediction of extreme and tolerable wave overtopping discharges through an advanced neural network. Ocean. Eng.127, 722. 10.1016/J.OCEANENG.2016.09.032

  • 201

    Zelada LeonA.HuvenneV. A. I.BenoistN. M. A.FergusonM.BettB. J.WynnR. B. (2020). Assessing the repeatability of automated seafloor classification algorithms, with application in marine protected area monitoring. Remote Sens.12, 1572. 10.3390/rs12101572

  • 202

    ZhuangX.LiW.XuY. (2022). Port planning and sustainable development based on prediction modelling of port throughput: a case study of the deep-water dongjiakou port. Sustainability14, 4276. 10.3390/su14074276

  • 203

    ZimekA.SchubertE.KriegelH. P. (2012). A survey on unsupervised outlier detection in high-dimensional numerical data. Stat. Analysis Data Min.5, 363387. 10.1002/sam.11161

  • 204

    ZouS.ZhouX.KhanI.WeaverW. W.RahmanS. (2022). Optimization of the electricity generation of a wave energy converter using deep reinforcement learning. Ocean. Eng.244, 110363. 10.1016/J.OCEANENG.2021.110363

Summary

Keywords

machine learning, maritime modelling, classification, prediction, critical review

Citation

Pourzangbar A, Jalali M and Brocchini M (2023) Machine learning application in modelling marine and coastal phenomena: a critical review. Front. Environ. Eng. 2:1235557. doi: 10.3389/fenve.2023.1235557

Received

06 June 2023

Accepted

17 August 2023

Published

11 September 2023

Volume

2 - 2023

Edited by

Jan Hofman, University of Bath, United Kingdom

Reviewed by

Soroush Abolfathi, University of Warwick, United Kingdom

Yong Jie Wong, Kyoto University of Advance Science, Japan

Updates

Copyright

*Correspondence: Ali Pourzangbar,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics