Diving Behavior of Mirounga leonina: A Functional Data Analysis Approach

The diving behavior of southern elephant seals, Mirounga leonina, is investigated through the analysis of time-depth dive profiles. The originality of this work is to consider dive profiles as continuous curves. For this purpose, a Functional Data Analysis (FDA) approach is proposed for the shape analysis of a collection of dive profiles. Complexity of dive shapes is characterized by a mixture of three main shape variations accounting for about 80% of the entire variability: U or V shape, vertical depth variability during the bottom time, and skewed left or right. Model-based clustering allows the identification of eight dive shape clusters in a quick and automated way. Connection between shape patterns and classical descriptors, as well as the number of prey capture events, is achieved, showing that the clusters are coherent to specific foraging behaviors previously identified in the literature labeled as drift, exploratory and active dives. Finally, FDA is compared to classical methods relying on the computation of discrete dive descriptors. Results show that taking the shape of the dive as a whole is more resilient to corrupted or incomplete sampled data. FDA is, therefore, an efficient tool adapted for processing and comparing dive data with different sampling frequencies and for improving the quality and the accuracy of transmitted data.


INTRODUCTION
In a global warming context, studying and understanding the foraging behavior of predators and their relationships with their surrounding environment is fundamental for conservation and management programs (Runge et al., 2014). According to optimal foraging theory, individuals are expected to minimize the time spent searching for and capturing food while maximizing their energy intake (Stephens and Krebs, 1986). Aquatic predators, such as birds, reptiles, and mammals that feed underwater but must breathe at the surface, are expected to optimize their capture success within the constraints of oxygen availability (Kooyman, 2012). For 20 years, the behavior of these specific animals has been widely studied with the implementation of tags that allow the sampling of numerous data both on their dive behavior, their physiology, and their biological and physical surrounding environment. Nowadays, bio-logging devices are the most appropriate tools to provide critical information on the diving and foraging of free-ranging animals (Evans et al., 2013). Technological advances have led to size and weight reduction of autonomous devices with gains in battery lifetime and memory size as well as new variables monitored. This emergence of new data sampled at high frequency (Block et al., 2011) enables the study of diving behaviors at various scales in time and space (Jonsen et al., 2007;Scheffer et al., 2012;Nowacek et al., 2016). Time Depth Recorders (TDRs) provide a large amount of data that are downloaded from sensors to computer systems where they are stored. In addition, many of these tags also sample environmental data (Harcourt et al., 2019). Despite concrete advances in the study of foraging behavior, a gap remains between fast advances in sampling technologies and adapted methods of data analysis required to face the increase in data complexity. Technological innovation and development of new measuring devices are much faster than the invention and validation of new methods for operational data processing. It is because innovative methods are often unknown to the scientists who build up the sampling design but also because the required methods are not yet available. New data leads, indeed, to problems that no one had ever thought of. Besides, many tags require on-board processing for satellite data transmission (Fedak et al., 2002). Energy costs impose constraints in this on-board processing to reduce computational time and to improve the quality of transmitted data while minimizing the amount of transmitted information. It is therefore necessary to have access to adapted methods to find the best compromises to satisfy these constraints.
One example of this fast increase is the biologging data volume provided by the long term monitoring of the deep and continuously diving southern elephant seal (Mirounga leonina) in the Southern Ocean as part of the National Observatory System MEMO (Mammals as Ocean Samplers) (Harcourt et al., 2019) or as a part of the Marine Mammals Exploring the Oceans Pole to Pole MEOP consortium (Treasure et al., 2017). These animals are equipped with a TDR tag and/or with a conductivitytemperature-depth Satellite Relay Data Logger (CTD-SRDL) tag measuring time-depth profiles (subsequently referred to as "SRDL"). Large numbers of deployments have given rise to large databases that provide access to sampled oceanographic variables (temperature, salinity, and fluorescence profiles) as well as movements of equipped individuals, especially dives (Fedak et al., 2001;Roquet et al., 2014). Figure 1 displays an example of time-depth dive profiles of southern elephant seals, Mirounga leonina. During offshore periods lasting 3-7 months, a female elephant seal performs quasi-continuous long deep dives (26 min in average, 432 m deep) punctuated by short breathing episodes at the surface (Hindell et al., 2016). The availability of high-frequency data combined with this continuous aspect of the dives lead us to consider that dives can be handled as curves. Data that arrive as observed curves are called functional data because they are observations of a variable (depth) indexed over a continuum (time). Time-depth dive profiles share common features: • They start from the surface and end at the surface. • The ending time and maximum depth are different between two dives. • An observed dive is constituted with numerous sampled points. • The number of sampled points changes between two dives.
• The sampled grid might not be equally spaced in time.
• An elephant seal can achieve several thousands of dives.
In their raw form, it is almost impossible to compare different dive shapes and even less to identify a small number of specific foraging behaviors.
Many studies deal with dive classification methods to organize dives in meaningful categories (Hindell et al., 1991;Campagna et al., 1995;Dragon et al., 2012). Classically, a finite set of descriptors, such as mean and maximum depth, duration of the descent, ascent, and bottom phases of the dive (see Hindell et al., 1991;Le Boeuf et al., 1993;Tremblay and Cherel, 2000;Watwood et al., 2006), are extracted from the high-frequency sampled dive data. These descriptors allow the construction of a table of observations and variables (the descriptors) over which standard multivariate analyses, such as reduction dimension methods (i.e., PCA), classification, or statistical tests (e.g., Random Forest, Thums et al., 2008;HMM, Ngô et al., 2019) will be performed.
This work proposes that we study the diving behavior of Mirounga leonina through the shape comparison of time-depth dive curves without the use of predefined numerical descriptors. Indeed, the choice of the descriptors relies on empirical and historical knowledge of the diving behaviors. Here, we argue that if the option of a finite number of well-suited descriptors is the predominant method to compare a collection of dives, it remains a partial description of the entire shape of the sampled curves. We assume that shape potentially contains all the possible descriptors of a dive curve. Therefore, this work first aims at comparing dive shape and numerical descriptors. In a second step, identified shape patterns in dive profiles will be connected with prey capture events to better understand foraging behaviors and the structure of the surrounding prey field.
After that, the document introduces the data obtained from southern elephant seal tags. In a second step, technical points for the transformation of discrete data to continuous functions are presented. We then propose some dimension reduction and classification methods adapted to data that are curves to compare and to characterize diving behaviors. A comparison with classical descriptor-based methods and tests for robustness is achieved with simulations on downsampled data. The ecological relevance of the results is finally discussed by comparing the characteristic shape to the number of prey capture events. The different steps of the method used in the article are summarized in Figure 2.

Processed Data
A total of 17 post-breeding southern elephant seal females, Mirounga leonina were tagged. Fieldwork was conducted at the Kerguelen Islands (Figure 3), from November to January 2010-2011 for three females, 2014-2015 for the six females, 2015-2016 for four females, and 2017-2018 for the four others. The seals were equipped with a TDR (Mk10, Wildlife Computer) recording time and depth at 0.5 or 1 Hz. For twelve females, the TDR was combined with an accelerometer. TDR-accelerometers sampled accelerations on three axes. The acceleration measured simultaneously on these three axes was used to estimate prey capture attempts (Viviant et al., 2010). Animals were captured and anesthetized using a 1:1 combination of tiletamine and zolazepam (Zoletil 100), which was injected intravenously (McMahon et al., 2000). After cleaning the fur with acetone, data loggers were glued on the head of the seals, using quick-setting epoxy (Araldite AW 2101). Therefore, the data loggers collected N = 87, 691 dives (Figure 1). All fieldwork

Dive Curve Construction
Let us consider a sampled dive. Data arrive as a collection of L pairs (t 1 , p 1 ), · · · , (t L , p L ) where t l is time (min), and p l the observed depth (m). We suppose that an observation p l of the diving depth at time t l may be well-represented by a continuous function π of time such that: where π is the deterministic part of the dive, and ε is a remainder. If the function π is well-chosen, it is hoped that this remainder is as small as possible. The form of π, which is directly connected to the shape of the dive, is expressed as a linear combination of K known basis functions φ 1 , · · · , φ K such that The basis functions φ k are chosen by the practitioner and are continuous functions of time. The shape of the function π is entirely determined by the knowledge of the coefficients α 1 , · · · , α K which are estimated using the data (t 1 , p 1 ), · · · , (t L , p L ) by penalized least squares regression (see

Supplementary Material).
This regression procedure is influenced by the following: • the choice of the very nature of the basis (i.e., Fourier basis, B-splines basis, and polynomial basis); • the choice of the number K of basis functions (i.e., the number of coefficients); and • the choice of additional constraints suggested by the data (or by some knowledge of the practitioner).
Many choices for basis functions are possible. Fourier basis, polynomial basis, splines, wavelets may be used. However, the choice for the basis has little influence for describing the shape of a dive (Ramsay and Silverman, 2005). In our case, cubic B-splines basis (piecewise polynomials of degree 3) is the most advisable choice as it combines fast computation capabilities and flexibility. This is a critical point for the onboard processing of data in satellite relayed data loggers. Although a small number K < L of basis functions is generally sufficient to capture the main variations in the shape of any dive, the choice of this number may have a significant influence on the shape of the estimated curves. The bigger K is, the closer the fit will be to the data, as usual in a regression problem (Figure 4).
However, the penalized regression procedure allows for getting rid of this choice by giving a compromise between the regularity of the obtained curve and its closeness to the data. This particular point will be illustrated in detail in the Result section.
Finally, a more detailed look at the data suggests that additional constraints must be included when searching for the best fit. Due to 0-offset correction, each dive starts from the surface [i.e., (t 1 , p 1 ) = (0, 0)], and ends at the surface [i.e., (t L , p L ) = (t M , 0) where t M is the duration of the dive]. The B-spline must reach the surface at least at the beginning and at the end of a dive, which does not occur with a non-constrained regression procedure. One other advantage of using penalized B-splines is that it is easy to include FIGURE 2 | Workflow of functional data analysis for the study of diving behavior of Mirounga leonina. Time-depth sampled profiles (a) can be approximated with continuous curves by a linear combination of K known functions (b). This first step reduces the dimensionality of the initial data and provides a new data frame. Observations are projected with PCA in a space of small dimensions using a number Q of principal component scores. Each component reflects a specific deformation of the shape of curves (c). By construction, deformations are independent, and their magnitude can be ordered according to a percentage of explained variance. PCA scores are used as inputs for model-based clustering in order to identify characteristic shapes of sampled curves (G1, G2, G3, and G4) (d). some additional pointwise constraints while keeping a simple solution to the regression problem (Ramsay and Silverman, 2005). These constraints can take various forms, especially if the curve must go through some points or if the derivative of the curve must have some values on some particular points (see Supplementary Material). Shape is all the geometrical information that remains when location, scale, and rotation effects are filtered out from an object (Dryden and Mardia, 2016). Therefore, the comparison between several diving shapes imposes that every estimated curves must start and finish at the same diving time. It is not the case as the diving time changes from dive to dive (Figure 1). In the same way, the shape of each curve must be independent of their maximum dive. These differences arise from physical and/or biological factors (prey density, frontal area, body condition). As shown in Figure 5, if care is taken to normalize diving curves by reducing both time and depth range to segment [0, 1], very different dives get the same shapes, which might highlight similar behaviors. Sampled data are then normalized before B-splines are computed, as shown in Figure 5.

Principal Component Analysis of Functional Data
Starting from the estimated curves π 1 , · · · , π N constructed as above, the aim is now to compare the shape of dives to identify a small number of dive patterns. The high number (N = 87, 691) of observed curves makes this task not easy. We propose the use of a functional PCA adapted to data that arrive as curves (FPCA). The PCA is similar to that performed on a data table crossing N individuals (the dives) and P variables except that variables are now B-splines coefficients (Ramsay and Silverman, 2005). In that case, the PCA loadings are curves instead of vectors (eigenfunctions instead of eigenvectors). The scores of FPCA allows representing observations that are curves in a space of small dimension. Finally, observations can be reconstituted in this space of small dimensions as a linear combination between some factors of variability (the eigenfunctions) and the scores according to a given amount of variability (see Supplementary Material).

Clustering
Characteristic curve clusters are identified with classification methods. Heuristic analyses of classical use (hierarchical clustering, k-means) may be a problem when determining the number of classes and when handling outliers (Melnykov and Maitra, 2010).
Model-based clustering (MBC) is a possible alternative to find a reasonable number of clusters and to deal with outliers. The idea behind MBC is that the sample of observations (dive profiles) arises from a distribution that is a mixture (a linear combination) of several components. Each of these components is associated with a distribution function and an FIGURE 5 | On the left, the representation of some dives (raw data). On the right, the representation of the same dives after normalization. Dives have different maximum depth, different duration, but dive patterns are similar. associated proportion in the mixture. In general, it is supposed that the distribution is gaussian multivariate. A cluster will be characterized by the parameters of the distributions (mean and variance) and by the weight (the proportion) associated with this distribution. The estimation of both distribution parameters and mixture probabilities is achieved by the expectationmaximization (EM) algorithm for maximum likelihood (Fraley and Raftery, 2002). The most suitable model and the number of clusters are determined using Integrated Complete-data Likelihood (Biernacki et al., 2000).
Moreover, MBC associates a measure of uncertainty when assigning an observation (a dive) to a given cluster. Observations detected as outliers are those for which the probability of belonging to any cluster is low, such as giving birth to a cluster of outliers. Notice that computational time is dependent on the dimension of the observations (K) and can be very expensive. One way to circumvent this problem is to achieve classification in the space of the first Q < K principal components of the FPCA, accounting for a sufficient amount of variability (let's say 90%).

Evaluation of the Robustness of the Approach
With remote sampled data, it often happens that the retrieved data are corrupted due to time-depth sensor problems. It can also happen that when tags are not recovered only a few points per dives are obtained by satellite transmission (Boehme et al., 2009). It raises several critical issues on the use of incomplete profiles, on how much results change when using downsampled data. Functional data analysis provides meaningful solutions to those problems.
The scores of an FPCA provide a useful pointwise representation of the relationships between observations that are curves. Test for robustness of Functional Data Analysis (FDA) is conducted with the following method. First, dive profiles are constructed using the whole set of observations, as described in section Dive Curve Construction. A reference FPCA is then achieved (see section Principal Component Analysis of Functional Data). Next, successive downsampling steps are performed by randomly removing an increasing number of pairwise observations (t, p), and dive profiles are re-estimated. Going through this process will modify the shape of the estimated dive profiles at each step. Comparing scores of successive FPCAs conducted on modified samples will help in understanding to which extent internal relationships between observations are changed. We place the downsampling step in the worst case to test the robustness of our method. One can also proceed by downsampling on a regular grid, which will lead to similar results.
In the same way, a reference PCA is achieved using the set of discrete descriptors computed on the raw non-normalized dive data: maximum depth, dive duration, bottom time duration, descent and ascent speed and the vertical sinuosity in the bottom phase (as in Hindell et al., 1991;Le Boeuf et al., 1993;Tremblay and Cherel, 2000;Watwood et al., 2006). The discrete descriptors are also computed when downsampling the raw data, and PCAs are achieved to compare the effects of sample modifications. In both cases (FPCA and standard PCA), each raw dive is damaged by removing a proportion of random points ranging from 20 to 90% of the available data.
One way to measure the deviation from the scores of the successive FPCA (resp. PCA scores) configurations to the scores of the reference FPCA (resp. PCA scores) configuration is to compute a distance between point clouds using the same number of relevant principal components. Because of the properties of any PCA, and thus FPCA, the sign of axes can be different from one FPCA to another. It is not advisable to compare separate FPCA coordinates straightforwardly. An appropriate distance is computed by keeping the reference configuration fixed and matching the others. This technique is termed Procrustes registration in the literature (Krzanowski, 2000). Once the Procrustes registration achieved, Krzanowski distance (Krzanowski, 2000) is computed between the reference FPCA and successive FPCAs on downsampled data using a fixed number of principal components accounting for a given amount of the variability (say 90%). This distance is also computed between the reference PCA on discrete descriptors and successive PCAs on downsampled data using the same number of principal components. Krzanowski distances must be normalized to compare the results obtained with FPCAs to those with PCAs. This normalization is obtained by dividing each FPCA (or PCA) scores by the entire variance of the corresponding point cloud. It reduces the variance (sum of squares of the PCA scores) of each obtained point cloud to the unit value (see Supplementary Material).

Choosing the Number K of Basis Functions
As previously mentioned (section Dive Curve Construction), the choice of the number K of basis functions may influence the accuracy of the fits when constructing curves from raw data. The larger K, the better data are fitted, but the fit may also include some noise or spurious local variations. On the other hand, if K is too small, some essential features of the curve can be missed (Figure 4). The choice of an optimal value of K is achieved using the same idea as to test the robustness of the FDA approach (section Evaluation of the Robustness of the Approach). First, a reference FPCA is carried out using a great number K = 50 of basis functions. The idea is now to achieve FPCAs on curves fitted when decreasing the number K to check in which extent internal relationships between observations are modified. The Krzanowski distance is computed by keeping the reference FPCA fixed and matching the others. Figure 6 displays variations of the distance between the reference FPCA configuration (K = 50) and other FPCAs when increasing the number K of basis functions. The first five principal components of each FPCA accounts for about 90% of the variability (88.2 ± 4.3%).
The Krzanowski distance is rapidly decreasing and vanishes to zero for K ≥ 8 of basis functions. It means that using more than eight basis functions has a few effects on the internal relationships between observations. For the following, an arbitrary number K = 20 is chosen as a reasonable number of basis functions.

Summary Statistics
Classical summary statistics can also be computed when considering data as curves. Figure 7A displays the mean curve given withπ where N is the number of sampled dives, and the standard deviation function computed with σ 2 (t) = 1 N N n=1 π n (t) −π(t) .
Mean dive gets a symmetrical U-shape with a maximum depth at 0.8 ( Figure 7A). When normalized, the maximum depth of each curve is 1. The mean curve of a set of dives with varying maximum depth (in time and magnitude) cannot reach a depth of 1. In the same way, the shape of the standard deviation function can be split into three parts: descent, bottom, and ascent phase. An increasing shape variability is observed during the descent. The bottom phase or bottom behavior phase is characterized by an equal vertical depth variability around 0.2, and this variability decreases with the ascent. As all dive curves start and finish at the surface, the variability vanishes to zero close to the surface. One can also use the correlation bivariate function to summarize the dependency of records across different values of normalized time.
At each point of the Figure 7B, it is possible to read the correlation for different pairs of time (t, s). The correlation is strong between the beginning and the end of dives (s = 0.05, t = 0.95). The white areas display weak negative correlations (lower than −0.2). The descent phase influences the bottom behavior. The more vertical the descent, the earlier maximum depth is reached.
The computation of these statistics is easy because each observed dive has been transformed into functional data over the same range [0; 1]. Without fitting the curves, it is not possible to compute these summary statistics using the raw data because the position and number of sampled points change between dives.

Shape Characterization and Comparison
The 2D mapping of the FPCA accounting for 65.1% of the entire variability shows the relative positions of each observation (N = 87, 691) depicting a dive, according to the two first factors (Figure 8). A kernel density estimation (gray areas) show that dives are roughly distributed in two groups characterized by curves (A) and (B). A practical way to understand how the FPCA shapes the structure of the point cloud is to represent the deformation generated by a factor (an eigenfunction) when moving along a factorial axis. Consider the mean curve π(t) projected at coordinate (0, 0) in Figure 8. Panels (C) and (D) show how that mean curve (red curve) is deformed when subtracting [panel (C), black curve] or adding [panel (D), black curve] the effect of the first factor ξ 1 (t) associated to the eigenvalue λ 1 such that π(t) ± λ 1 ξ 1 (t). (1) This first axis opposes V-shape observations [panel (C)] with square-shape observations [panel (D)]. Dives are ordered along   variability, and asymmetry), accounting for about 80% of the entire variability.

Pattern Identification and Relationship With Prey Capture Events (PCE)
Taking advantage of the FPCA dimension reduction, the MBC algorithm is realized in the space of the Q = 5 first principal components accounting for 87.3% of the entire variability.
Clustering in the space of principal components will make the composition of the clusters more robust if changes occur in sample composition. We identified a number of 8 clusters whose mean and quantile curves are represented in Figure 9 (outliers are excluded). According to Dragon et al. (2012), dive shapes can be categorized as drift, exploratory and active dives [respectively (A), (B), and (C)]. Note that drift dives are non-symmetric dives with low variability in opposition to the other clusters. Additionally, exploratory and active dives are distinguished by their shape: V-shape for the first and square-shape for the second. However, differences occur within the same category. These differences are represented by a slight modification of the global shape due to time spent at the bottom, or variability expressed by discrepancies of the quantile curves. Active dives are sorted according to the duration of the bottom time. The behavioral meanings of the dive clusters and their differences will be discussed in detail in the Discussion section. The dive clusters will, now, be labeled according to their position in Figure 9 (i.e., drift, exploratory 1, active 3).
Identified patterns of dive shapes can be connected to foraging behaviors through the number of prey capture events (PCE) during dives. As a reminder, PCE are estimated according to the analysis of TDR-accelerometer data (Viviant et al., 2010).
The distribution of prey capture events (PCE) varies clearly according to the 8 dive shapes categories, as illustrated in

Robustness of the Approach
We compare in this section, the efficiency of both classical PCA with numerical descriptors as variables and FPCA on dive curves through simulations of increasing downsampled datasets. Krzanowski distance is calculated between the reference FPCA (resp. PCA) and other FPCAs (resp. PCAs) on downsampled data (Figure 11).
In the functional case, the distance remains close to 0 up to 80% degradation. In the case where classical descriptors are used, the distance increases from the beginning to obtain a maximum at 90% degradation, which is much higher than in the functional case (0.26 compared with 0.07).

DISCUSSION
In this study, we propose some statistical methods to cluster dives into categories which are ecologically appropriate and consistent with previous dives classification methods by distinguishing resting, exploring, and foraging dives. The characterization of dive shape combined with PCE allows the identification of FIGURE 10 | Boxplots of the number of estimated prey capture events per cluster. Labels identify the three major categories of diving behaviors.
FIGURE 11 | Krzanowski distance calculated between reference PCA of raw dives and PCAs of downsampled dives in functional case (red line) and using classic descriptors (black dotted line). Each raw dive has been downsampled by removing a percentage of points ranging from 20 to 90% by 5%. Both distances have been normalized for the ease of comparison. particular foraging behaviors conditional to prey density. It therefore provides information on the spatial and temporal distribution of resources. The originality of this work is to consider the functional nature of the data: dives are curves, and the statistical methods include this property. It is then possible to study a sample of dive curves using the shape of the dives. Classification with MBC allows the identification of more diving behavior subcategories within each main diving categories proposed in the literature. These dive categories are ecologically relevant in terms of variation in the number of prey catch events. Even if many factors (e.g., environmental or physiological variables) can drive foraging behaviors, each dive category matches to a different level of prey catch levels. Furthermore, the proposed methods are more robust to downsampled data than previous classification methods based on numerical descriptors. In this study, the physical dimensions of dives (time and depth) are deliberately excluded by a normalization step when focusing on the shape analysis of dives. Putting dives back to their real dimensions is a mandatory task to make the connection between dive shape and real physical variables (related to depth and time mostly). In other words, the study of dive shape does not prevent the computation of classical numerical descriptors. However, dive shape analysis provides some additional information.
Dives are characterized by three major factors (section Shape Characterization and Comparison). The first one is associated with variations in the proportion of bottom time (U or V shape), irrespective of the absolute bottom time, a usual descriptor specific to each dive. The second one is associated with variations in vertical movements during the bottom time, as defined as sinuosity, a numerical descriptor as in Bailleul et al. (2008). The third one is attached to variations in asymmetry in shape, the equivalent of which is not available as a numerical descriptor. This asymmetry is found in many diving species: pinnipeds, seabirds, and seals (Schreer et al., 2001). The role and the origin of this asymmetry are manifold. In some cases, asymmetry is an indicator of body conditions (i.e., buoyancy). Vertical ascent speed tends to be more constrained than descent speed, likely in response to ecophysiological constraints to reduce the risk of decompression accident (see Richard et al., 2014). The proposed methods are, therefore, able to identify the importance of this asymmetry factor in an automated way.
Looking now at the clustering, eight dive clusters have been labeled as drift, exploratory, and active (Figure 9). These groups own characteristic shapes, which are a mixture of the three previous factors identified with FPCA accounting for about 80% of the variability between curves. Among then, the first category of dives corresponds to drift dives or resting/food processing dives (Crocker et al., 1997;Mitani et al., 2010). Exploratory dives are traveling and/or observation dives. Active dives are associated with foraging because individuals optimize the duration of bottom time to increase overall foraging success (Hindell et al., 1991;Fedak et al., 2001). However, the analysis of dive shapes suggests that it is more a question of optimizing the proportion of bottom time (according to the duration of the dive) more than the absolute bottom time. It could provide support for a non-linear relationship between search time and prey acquisition (Thums et al., 2012). The vertical depth variability (i.e., sinuosity) during the bottom phase of active dive clusters expresses movements related to prey captures. These two criteria have a real and well-known behavioral meaning. A proportion of 72-76% of the feeding events takes place during the bottom phase of the dive (Guinet et al., 2014;Jouma'a et al., 2016). Le  also shows that the higher the sinuosity of the bottom phase of the dive over a narrower depth range (i.e., a limited depth range), the higher the number of PCE. The different classes of foraging dives are mainly driven by the vertical accessibility of the prey (close to the surface to deep) and their distribution within the water column (aggregated or dispersed over narrow or large layers). Most of the dive range bottom duration time of southern elephant seals tends to increase with prey encounter (Jouma'a et al., 2016). Therefore, the variation in southern elephant seal behavior at the bottom of their foraging dive constitutes, as we should expect, a response to the variation of the prey distribution with an increase in PCE during foraging dives. It is reflected by the variation of the number of PCE for each dive cluster. Foraging dives, that have the longest bottom time, have the highest number of PCE (boxplots in Figure 10). The differences in the variability, series of changes in depth, underline the link between foraging behavior and spatial distribution of prey patches. Active 1 class has less variability but more PCE. It can be assumed that more dense, less dispersed patches allow individuals a higher number of captures.
Characteristic shapes in clusters can be compared to numerical descriptors computed from raw data ( Table 1). The relative proportion of each group varies, with active dives being the most represented and drift dives the least. Active 1 and 2 are the shallowest. Active 3 and 4 are deepest. There are, therefore, two groups of active dive clusters: shallow active (1 and 2) with more PCE, and deep active (3 and 4). The longer the duration of the bottom time, the shallower the dives, and the higher the number of PCE. Active dives maximize foraging time by minimizing descent and ascent time.
Except exploratory 1 cluster, exploratory dives are the deepest, and they have the lowest duration of bottom time. Exploratory 1 dives have a skewed left shape, and the other exploratory dives have a V-shape. These are observation and/or traveling dives. Drift dives have a reasonable maximum depth with low descent and ascent speed. Individuals make little effort during these dives. In addition, the number of PCE is close to 0 (Figure 10). It is in line with the hypothesis that these dives are food processing and/or resting dives. Even if they are scarce, these drift dives are very well and identified in an automated way using the FDA approach.
During a foraging migration of an animal, it happens that the retrieved data are damaged due to sensor problems. Furthermore, when tags are not recovered, only a few depth points per dives are transmitted by satellite (Boehme et al., 2009). Classically, the transmitted depth points are selected through a broken-stick approach (Heerah et al., 2014). The FDA approach can be useful to optimize the transmitted amount of data. For instance, it is possible to transmit fewer basis coefficients to reconstruct the entire profile easily. It is also possible to optimize the position of spline knots to get an accurate fitted profile with few coefficients. The FDA approach is also suitable for processing and comparing dive data with different sampling frequencies. This particular point has been illustrated through the analysis of simulated data sets. Our results show that the functional data analysis is more resilient to corrupted data compare to classification methods relying on discrete dive descriptors. Furthermore, the FDA computation time is significantly shorter than the computation time with the broken-stick algorithm. The computation time for the FDA approximation of N = 87, 691 entire curves was 357 s against 4, 404 s for the broken-stick algorithm on a 2.7 GHz processor. The broken-stick algorithm gives only 6 pairs of time-depth points, whereas the FDA approximation is pretty good, taking a basis expansion with six coefficients as previously shown in Figure 3. This information is relevant when aiming at reducing computation time and, therefore, energy consumption for onboard processing and real-time transmission of dive profiles.

CONCLUSIONS AND PERSPECTIVES
This study demonstrates the potentiality of the FDA method for the characterization and the classification of time-depth dive profiles. Using the shape of dive profiles enables us to obtain, in a very visual way, results that are ecologically consistent and coherent with previous dive classification methods implemented on elephant seal diving data. Moreover, diving behaviors of Mirounga leonina are straightforwardly related to a small number of independent factors connected to shape variations in dives and to account for most of the entire variability in curve shape. The processed data set is large (N = 87, 691 dives), and FDA makes it possible to process large amounts of data automatically and quickly. Such an approach will be relevant to conduct analyses, such as inter-annual or longer-term variations in the diving behavior of the elephant seal. Besides, the proposed methods allow the efficient discrimination of drift dives used to monitor the change of buoyancy, and therefore, body condition of the seal during their migration. Studying the ratio between buoyancy (obtained thanks to the asymmetry of dives with principal component scores) and the number of active dives would highlight the foraging efficiency of individuals in geographical areas of interest. The use of FDA allows the correct processing of downsampled data. It is possible to process low frequency transmitted data from unrecovered tags and to compare them with high-frequency data. It is a real advantage if one considers the possibility to compare high-frequency current data to less accurate historical recorded data. Finally, the new results of this study raise further questions on the origin of the diving behaviors and on the forces that shape them. The next promising step is to integrate environmental covariates (temperature, salinity, and chlorophyll) that can also be considered as profiles better to understand links between dive curve shapes and behaviors.

DATA AVAILABILITY STATEMENT
The dataset analyzed for this study can be found in the MEOP-CTD database (Treasure et al., 2017), http://www.meop.net/.

ETHICS STATEMENT
The animal study was reviewed and approved by the TAAF Ethics Committee (Comite Environnement et Préfet des Terres Australes et Antarctiques Francaises).

AUTHOR CONTRIBUTIONS
MG conceived the idea and wrote the manuscript. MG, CM, and DN analyzed the data and prepared the figures. CM, DN, CG, and BP corrected the manuscript and helped for important results interpretations. MG, CG, and BP carried out the fieldwork. CG and BP provided additional data. All authors contributed to the article and approved the submitted version.

FUNDING
The Ph.D. scholarship of MG was funded by the French Ministry for Education and Research.