Assessing Alternative Microscopy-Based Approaches to Species Abundance Description of Intertidal Diatom Communities

Diatoms usually dominate microphytobenthic biofilms in coastal and estuarine intertidal environments. Yet, functional studies on biofilms often skip species analysis because benthic diatoms are notoriously difficult to extract from sediments and challenging to identify at that taxonomic level. Valid, less time-consuming alternatives would surely be welcomed and increase the inclusion of community structure information in microphytobenthos (MPB) ecophysiological studies. Starting with an original 181-species abundances matrix (OSM), obtained during a 2-year spatial–temporal survey in a Tagus Estuary intertidal flat with contrasting sediment textures, the current study assessed the effectiveness of several approaches to species abundances analysis. The effect of excluding abundance data or rare species, the influence of taxonomic resolution, or the use of size-based metrics on biotic multivariate patterns was examined by an objective comparison that replicated these different approaches on three different levels: (1) inter-matrix correlations, (2) performance in several non-parametric multivariate analyses (ANOSIM, MDS), and (3) correlations with the environmental dataset. When compared with the OSM, all matrices had strong or very strong positive correlations. All discriminated successfully spatial patterns, separating well assemblages from sandy and muddy sediments, and all had significant correlations with the environmental dataset. Apart from the relative biovolume species matrix (BSM), only the species matrices were able to discriminate significantly temporal patterns. The exclusion of the rarest species (48% of total) had a negligible effect, with the common and original species abundances matrices having a ρ > 0.99 correlation. Of the alternative approaches to species abundances, species presence/absence and the genera abundances matrices yielded the best results overall. Genera presence/absence and the size-class matrices had intermediate performances, with the former performing comparatively poorly with regard to seasonal patterns. BSM had the lowest correlation with the environmental variable dataset (ρ = 0.598) and the worst overall performance in the other multivariate routines. This means that either a high-taxonomic resolution qualitative analysis (i.e. species presence/absence) or, in alternatively, a genus-level analysis retaining abundance data may be sufficient to describe basic spatial differences in estuarine intertidal flats. However, if seasonal variations in mudflat diatom assemblage structure are to be detected, species-level abundance data are still necessary.

Diatoms usually dominate microphytobenthic biofilms in coastal and estuarine intertidal environments. Yet, functional studies on biofilms often skip species analysis because benthic diatoms are notoriously difficult to extract from sediments and challenging to identify at that taxonomic level. Valid, less time-consuming alternatives would surely be welcomed and increase the inclusion of community structure information in microphytobenthos (MPB) ecophysiological studies. Starting with an original 181species abundances matrix (OSM), obtained during a 2-year spatial-temporal survey in a Tagus Estuary intertidal flat with contrasting sediment textures, the current study assessed the effectiveness of several approaches to species abundances analysis. The effect of excluding abundance data or rare species, the influence of taxonomic resolution, or the use of size-based metrics on biotic multivariate patterns was examined by an objective comparison that replicated these different approaches on three different levels: (1) inter-matrix correlations, (2) performance in several non-parametric multivariate analyses (ANOSIM, MDS), and (3) correlations with the environmental dataset. When compared with the OSM, all matrices had strong or very strong positive correlations. All discriminated successfully spatial patterns, separating well assemblages from sandy and muddy sediments, and all had significant correlations with the environmental dataset. Apart from the relative biovolume species matrix (BSM), only the species matrices were able to discriminate significantly temporal patterns. The exclusion of the rarest species (48% of total) had a negligible effect, with the common and original species abundances matrices having a ρ > 0.99 correlation. Of the alternative approaches to species abundances, species presence/absence and the genera abundances matrices yielded the best results overall. Genera presence/absence and the size-class matrices had intermediate performances, with the former performing comparatively poorly with regard to seasonal patterns. BSM had the lowest correlation with the environmental variable dataset (ρ = 0.598) and the worst overall performance in the other multivariate routines. This means that either a high-taxonomic resolution

INTRODUCTION
Diatoms are usually the most ubiquitous and dominant microalgal component of the microphytobenthos (MPB) communities in intertidal estuarine and coastal areas (MacIntyre et al., 1996;Hamels et al., 1998;Méléder et al., 2007). They form dense biofilms on the sediment surface during low tides (Consalvey et al., 2004) and, through the production of extracellurar polymeric substances (EPS), play a pivotal role in intertidal sediment stabilization (Stal, 2010;Passarelli et al., 2014). Since Admiraal (1984) published his classic review on the ecology of estuarine sediment-inhabiting diatoms, research on MPB has greatly expanded in many different fields, such as nutrient cycling (Cabrita and Brotas, 2000), carbon transfers (Middleburg et al., 2000) or benthic-pelagic coupling (de Jonge and van Beusekom, 1995;Hernández Fariñas et al., 2017), biofilm vertical migration, and diatom photoprotective mechanisms (Van Colen et al., 2014;Marques da Silva et al., 2017). While the knowledge on the MPB functional aspects has improved decisively during that period, the structural aspects of these diatom-dominated communities (i.e. species taxonomy, distribution, and diversity) have been more scantily studied and only represent 20% of overall MPB publications of the last 30 years (Park et al., 2014).
This trend is a consequence of the inherent difficulty of sampling and identifying marine and coastal benthic diatoms, which is rooted in many causes: (1) benthic microalgae are notoriously difficult to extract from the sediment (Muylaert et al., 2002); (2) a paucity of comprehensive taxonomic monographs (Sullivan and Currin, 2002;Trobajo and Sullivan, 2010) makes species identification and intercomparison between studies problematic (Underwood and Barnett, 2006); (3) the number experienced diatomists (i.e. phycologists specialized in diatom taxonomy) that routinely work on MPB assemblages in the last decades is only a small fraction of the ones working in freshwater benthic systems or with coastal phytoplankton (Ribeiro, 2010). This means that research topics that rely on sound specieslevel identification and cell counts, such as distributional studies (Underwood et al., 1998) or the establishment of a diatom-based water quality index for estuaries, remain seriously underdeveloped (Trobajo and Sullivan, 2010).
Not surprisingly, current MPB functional studies tend to ignore species composition altogether and focus only on the MPB biofilm, in a "black-box" approach (Kociolek and Stoermer, 2001a;Underwood, 2005). Ideally, both structural and functional attributes of intertidal diatom assemblages should be investigated simultaneously (McIntire and Moore, 1977;Kociolek and Stoermer, 2001b) and there are recent examples where the ecophysiology of the biofilms was complemented with their species composition (Serôdio et al., 2012;Vieira et al., 2013;Cartaxana et al., 2015). However, there are often limitations that impede MPB ecologists from pursuing complementary detailed community descriptions to their main research aims, such as time, budget, and/or expert availability for collaborations. Therefore, alternatives to very labor-intensive species abundances datasets would surely be welcomed by MPB researchers eager to add a realistically achievable structural description of the diatom assemblages which could lead to a better understanding of the biofilm processes they are studying.
To our knowledge, this question has not been addressed within the context of MPB research (Ribeiro, 2010), although it has been consistently explored by community ecologists that rely on benthic invertebrate assemblages (Jones, 2008), phytoplankton (Carneiro et al., 2010), or freshwater diatoms from lentic and lotic systems (Kelly, 2013) as biological quality elements in environmental monitoring and bioassessment programs. Consequently, there has been a strong incentive to find more cost-effective alternatives to species-level analysis. Most of the discussion concerns taxonomic sufficiency, which is the minimum taxonomic resolution needed to meet the study objective (Ellis, 1985). According to this concept, only a small set of species (e.g. Clarke and Warwick, 1998) or easier-to-identify higher taxonomical levels (e.g. Olsgard et al., 1997;Lavoie et al., 2009;Menezes et al., 2010;Rimet and Bouchez, 2012a) may be required to adequately describe community patterns and their responses to natural or anthropogenic disturbances. Several other aspects concerning community description have also been pursued, such as: (1) the consequences of different transformations of abundances-bysample data in multivariate analysis (e.g. Thorne et al., 1999;Heino, 2008); (2) the suitability of qualitative, presence/absence species datasets (Carballo and Naranjo, 2002); (3) the effect of rare species exclusion (e.g. Cao and Williams, 1999;Marchant, 1999); (4) the importance of cell size (Wunsam et al., 2002;Lavoie et al., 2006Lavoie et al., , 2010; and (5) the use of life-forms or ecological guilds as functional surrogates to taxonomic units (Simberloff and Dayan, 1991;Passy, 2007;Rimet and Bouchez, 2012b).
All these avenues warrant further investigations using intertidal diatom assemblage data. The works by Somerfield and Clarke (1995) and Olsgard et al. (1997Olsgard et al. ( , 1998 on marine macrofauna communities seem to provide a useful framework to do so. In this study, we tested the principle of taxonomic sufficiency on a dataset with a high taxonomic resolution collected by Ribeiro (2010). This dataset encompasses diatom communities in contrasting intertidal flats and provided an FIGURE 1 | Study area. Sampling stations identified by the letters in the box, gray areas represent intertidal areas.
opportunity to evaluate and test several alternative approaches to species-level community description in MPB research.

MATERIALS AND METHODS
Detailed descriptions of the study site, sampling procedures, environmental parameters, and diatom analysis (e.g. MPB extraction, slide preparation, and cell counts) are given elsewhere (Jesus et al., 2006(Jesus et al., , 2009Ribeiro, 2010;Ribeiro et al., 2013). Sampling was conducted between 2003 and 2004 in a total of 12 bimonthly sampling campaigns. The sampling area consisted of two transects on the eastern shore of the Tagus estuary, each transect with three sampling stations running perpendicular to the shore (Figure 1). Sampling stations could be divided into three main groups, regarding their sediment texture: station A1 had fine and medium sands with almost no mud content; three stations were mainly composed by medium and coarse sands with an average mud fraction between 5 and 14% from the sandiest to the muddiest (i.e. A2 and A3 to V1, respectively); and two stations (V2 and V3) were muddy and had almost no sand content (Jesus et al., 2006;Ribeiro et al., 2013). Ribeiro (2010) and Ribeiro et al. (2013) described in detail the spatial-temporal variation of this intertidal diatom community, its diversity patterns, and other features related to community physiognomy, such as life-form and size-class distributions. The original species abundance matrix (OSM) is composed by a total of 181 diatom taxa that were identified and counted in the 68 collected samples (Ribeiro, 2010; see Supplementary Material for original matrix). The abundances were standardized and are presented as relative percentages. All the derived matrices used in this study stem from the OSM, either by data transformation, species selection, and/or species abundance aggregation, with the objective of evaluating and comparing their performance using non-parametric multivariate tools, found in PRIMER R 6 software package (Clarke and Gorley, 2006). They are listed, together with their composition, type, transformation of data, and aims, in Table 1. They can be divided in two main groups that represent the two different facets of community structure analysis that are explored within the framework of the current study, namely metrics selection and taxonomic sufficiency.

Metrics Selection
These matrices retained their species-level resolution in order to test the effect of two main data transformations. Transformations downweigh the influence of dominant taxa to varying degrees (Field et al., 1982). OSM abundance data suffered the following transformations: (1) presence/absence. This transformation downweighs completely the species abundance data and shifts the emphasis to changes in taxonomic composition only (Clarke and Warwick, 2001). The aim of the presence/absence species matrix (0-1 SM) is to replicate an approach that only studies the assemblages qualitatively and, consequently, saves the time needed for cell counting; (2) biovolume. The biovolume species matrix (BSM) is based on the relative contribution of each taxon to the total biovolume of the assemblage (cf. Haubois et al., 2005) and can also be considered a severe data transformation. For each sample, the weighted relative biovolume of each taxon derived from its relative abundance and from its previously calculated median biovolume. This approach counters the over-estimation of the abundant small diatoms, which may actually contribute to a small fraction of the overall biomass (Hillebrand et al., 1999), while still retaining a high taxonomical resolution . Cell biovolume calculation was based on equations proposed by Hillebrand et al. (1999) and derived from biometric measurements made by Ribeiro (2010).

Taxonomic Sufficiency
These matrices either aggregated species abundances to a higher taxonomic level or to a taxonomic surrogate or, in alternative, retained the species-level analysis to a restricted set of common species or to the species from a single genus. Three main approaches were followed: (a) Taxa selection -two different lines were included in this approach: (1) Exclusion of rare taxa. A common species matrix (CSM), with no data transformation, included only species with more than two occurrences and/or that surpassed, at least in one sample, 1% of abundance and it was composed by 94 species. This allows a direct comparison with the OSM and, thus, to solely assess the effect of the exclusion of rare taxa. Ribeiro et al. (2013) used the same 94-species matrix but fourth-root transformed the abundance data. It is referred in the present study as Transformed Species Matrix (TSM) in order to allow a linkage between both works; (2) Focus on a single genus. Navicula was the most diverse and abundant genus in the OSM. The NAV matrix only retains the 29 species of Navicula and was root-transformed to counter the overwhelming preponderance of a handful of very abundant species. (b) Taxonomic resolution -the effect of reducing taxonomic resolution is tested in this approach. The species abundances found in OSM were aggregated to the genus level (57 genera) in a genus matrix (GM). Higher taxonomic levels were not pursued because suprageneric relationships in diatoms remain largely unknown (Cox, 2009) and a major restructuring of the current diatom classifications is still an ongoing process Kociolek, 2007, 2010). A presence/absence GM (0-1 GM) was also created. (c) Taxonomic surrogacy -One trait-based abundance matrix was assessed and compared with the taxonomy-based ones. The size-class matrix (SCM) were obtained from the OSM by simple aggregation of species abundance in the four sizeclasses described by Ribeiro (2010) and which comprised the very small (<100 µm 3 ), small (100-250 µm 3 ), mediumsized (250-1000 µm 3 ), and large (>1000 µm 3 ) diatoms. Given the low number of categories, the data were not transformed.

Matrix Comparison and Multivariate Analysis
Similarity matrices, also known as resemblance matrices, were constructed from the matrices listed in Table 1 using the Bray-Curtis distance (Bray and Curtis, 1957). The multivariate patterns on these datasets were then compared to each other in three different ways: (1) Inter-matrix correlations: Following the method described in Somerfield and Clarke (1995), the rank correlation coefficient (Spearman's ρ) between all the elements of any pair of similarity matrices with matching set of samples can be calculated. All inter-matrix rank correlations were determined, thus allowing a construction of a second similarity matrix, which was used as input matrix of a "second-stage" non-metric MDS (Somerfield and Clarke, 1995). This ordination permits simultaneous comparisons of all datasets and allows to perceive the relative effect of the different approaches and transformation of the data. Given that all matrices stem from the same OSM, they are not independently derived and a permutation test between resemblance matrices cannot be applied (Clarke, 1993).
(2) Effect on multivariate analyses (ANOSIM, MDS): Also following the approach proposed by Somerfield and Clarke (1995), the effect on subsequent non-parametric multivariate routines for each matrix listed in Table 1 was examined. An analysis of similarities permutation test, with a two-way crossed (with no replicates) layout (ANOSIM, Clarke and Warwick, 1994), was done on each similarity matrix to examine the significance of differences between all stations and sampling dates. Multidimensional scaling (MDS) ordinations were performed to better visualize the multivariate patterns of different similarity matrices (Clarke, 1993;Clarke and Warwick, 2001). (3) The relationships between patterns in multivariate structure and the environmental dataset collected simultaneously to the MPB sediment samples were examined using the BEST significance test procedure (Clarke et al., 2008). This analysis computes the Spearman's rank (ρ) correlation between the biotic and abiotic dissimilarity matrices and selects the subset of environmental variables that scores the highest ρ, thus choosing the combination of variables that maximizes the match between the biotic and abiotic datasets. In addition, the statistical significance of this match is given by a global match permutation test of the null hypothesis ρ = 0 (999 permutations of sample labels for a H 0 rejection at p < 0.1%). The environmental dataset was the one used by Ribeiro et al. (2013) and included: tidal height; sediment temperature; light (i.e. Photosynthetic Photon Flux Density, PPFD); porewater salinity; NH + 4 , NO − 2 , NO − 3 , PO 3− 4 , and SiO − 2 porewater concentrations; organic matter content (i.e. ash-free dry weight); and sediment grain size composition (following the Wentworth grade scale for grain size). Sediment water content had a very high Pearson's correlation with mud content (i.e. % grains < 63 µm), so only the latter was included in the analysis.

RESULTS
The interrelationships between the different approaches and the inter-matrix correlations ( Table 2) can be visualized in the "second-stage" MDS (Figure 2), where a bigger proximity between two similarity matrices in the MDS ordination reflects a higher pairwise Spearman's rank correlation. The first striking result is the fact that original species matrix (OSM) and the CSM scored an almost perfect correlation (ρ = 0.9999), which can be perceived by the superposition of both points in Figure 2. The TSM scored slightly lower correlations with both OSM and CSM (ρ = 0.911), thus implying that the fourth-root transformation of the dataset had a greater effect than the reduction from initial 181 taxa (in the OSM) to 94 common taxa (both in CSM and TSM). This is further reinforced by the higher correlation of TSM to both species (0-1 SM, ρ = 0.971) and genera (0-1 SM, ρ = 0.934) presence/absence matrices, as both transformations reduce or nullify, in the case of the presence/absence, the weight and influence of the high abundances of the dominant species in the assemblages.
The 57-taxa genera abundance matrix (GM) scored a lower correlation to the OSM (ρ = 0.841) than the 29-taxa Navicula species abundance matrix (NAV, ρ = 0.886) or than the size-class abundance matrix (SCM, ρ = 0.866), which was composed of only four categories. The correlation between both genera matrices was also comparatively lower (ρ = 0.674) when compared to other inter-matrix correlations, namely between species-level ones. The exception was the species biovolume matrix (BSM) which, despite its high taxonomic resolution, scored the second lowest correlation with the OSM (ρ = 0.745). Moreover, its isolated FIGURE 2 | Second-stage ordination by MDS of ranked inter-matrix pairwise Spearman rank correlations. Resemblance matrices included: Original (OSM), common (CSM), transformed (TSM) and Navicula species abundances matrices; species relative biovolume (BSM) and genera abundances (GM) datasets; species (0-1 SM) and genera (0-1 GM) presence-absence datasets; size-class abundances dataset (SCM). The symbols represent the type of metrics used in each matrix: abundance data ( ), presence/absence ( ), and weighted biovolume (♦). position in MDS ordination (Figure 2) indicates comparatively low correlations with all other matrices. Nonetheless, all matrices scored a correlation above ρ > 0.7 with the OSM, which means that most of the matrices should give correlated to highly correlated patterns in their analyses ( Table 2).
The MDS ordinations of the diatom assemblage data from the Tagus estuary (Figure 3) indicate that the overall patterns of community structure are mostly retained. Even as "raw data, " available in the OSM, it clearly shows a distinction between diatom assemblages from sandy stations and assemblages from muddy stations (i.e. V2 and V3). The assemblages found in the low mud-content/fine sand station A1 are usually the furthest away from the mudflat ones, with the mixed sediment, muddysandy assemblages of stations A2, A3, and V1 between them. This pattern is repeated in all other ordinations, although in the case of the BSM it becomes much less obvious. Moreover, in the latter MDS, the mudflat samples cluster together, while the sandflat ones are much less aggregated, a pattern that is the inverse of the one observed in all other MDS ordinations. With all other species-level matrices (OSM, CSM, TSM, NAV, and 0-1 SM) it was also possible to separate clearly the assemblages of the station A1 from the other assemblages, collected in the mixed sediment, medium sandy stations (i.e. A2, A3, and V1). The separation between these two types of sandflat assemblages was less discernible with the genera and size-class matrices (Figure 3).
The results of the ANOSIM tests showed that all matrices were able to discriminate significantly the spatial differences, but only species-level matrices rejected both null hypotheses of the two-way ANOSIM test ( Table 3). The genera presence/absence matrix (0-1 GM), with a significance probability of 16.3%, clearly failed to reject the null hypothesis that there were no temporal differences (test significance level p < 0.01%), while the species biovolume (BSM), genera abundances (GM), and size-classes (SCM) matrices failed to reject the second null hypothesis at a much lower probability level. The global R for differences between sites ranged from 0.93 (TSM) to 0.67 (SCM). Apart from BSM, the species matrices had slightly higher R-values for both differences between sites and differences between sampling dates, when compared to the genera and size-classes matrices. The Navicula dataset had the highest global R for temporal differences The ANOSIM global test statistic (R), at a significance level p < 0.01%, allows evaluating if the ability to discriminate sites and sampling dates is maintained. The BEST routine's test measures the link between the environmental dataset and the different matrices of the biotic data at p < 0.1% (999 permutations). Spearman's value (ρ) of the environmental variable subset that is highest correlated to the biotic datasets are also presented.
(R = 0.34). Finally, the presence/absence matrices had a slightly higher global R-values for differences between sites than the corresponding species or genera abundance matrices, while a considerably lower global R for differences between sampling dates ( Table 3).
The relationships between patterns in multivariate community structure and the environmental variables were examined using the BEST procedure. All biotic datasets had significant correlations (p < 0.1%) with the selected environmental variable subsets, ranging from ρ = 0.598 (BSM) to ρ = 0.863 (TSM), which meant that the correlations between the biotic and abiotic datasets varied from moderate, in case of the biovolume and size-classes matrices, to strong in the case of the genera and species-level matrices ( Table 3). These results were comparable to the other previous analyses. The species-level matrices had very similar correlation values, albeit not always selecting the same subset of environmental variables. The genera matrices both had "mud" as the highest correlated environmental variable, with the presence/absence dataset scoring a slightly higher correlation (ρ = 0.86).

Rare-Species Exclusion
The reduction of the diatom abundances matrix from the 181-taxa original matrix (OSM) to the 94-taxa CSM and TSM matrices had a negligible influence in the overall multivariate results. The Bray-Curtis similarity matrices derived from OSM and CSM had an almost perfect match, despite the exclusion of 48% of the initial taxa, and both had extremely high correlations with the TSM matrix, used in Ribeiro et al. (2013), which had the best results overall. The TSM matrix had the highest correlation between diatom community patterns and environmental variables (BEST test) and one of the top results in the ANOSIM test, where it performed slightly better in discriminating sampling dates than most of the other matrices. The current study seems to indicate that the comparatively slight gains in clarity were more a consequence of data transformation (i.e. fourthroot transformation of abundance data) than from the fact that the rare taxa were discarded. Nevertheless, it is likely that a combination of both steps was responsible for the clearer MDS ordination and cluster analysis results shown in that previous work.
Rare taxa exclusion can be somewhat arbitrary and may have a great impact in the overall results of several multivariate approaches (Cao et al., 1997(Cao et al., , 2001Cao and Williams, 1999) but it also eliminates accidental occurrences that cloud the final multivariate outcomes (Marchant, 1999). In a study that evaluated the effect of exclusion of diatom taxa on multivariate analysis in a large diatom dataset, Lavoie et al. (2009) concluded that the exclusion of taxa based on relative abundances could be confidently made until a ≥2% threshold, but that extra care must be taken when excluding taxa based on frequency of occurrence. In our study, a very conservative "cut-off " line was chosen (i.e. a ≥1% threshold and minimum of two occurrences in 68 samples) that allowed the elimination of allochthonous species (e.g. all phytoplanktonic taxa). However, as several authors argue, there is no biological justification for excluding rare species (Cao et al., 2001) and, contrary to PCA and DCA ordinations, the multivariate routines based on Bray-Curtis resemblance matrices do not require the exclusion of rare species (Clarke and Warwick, 2001). There are also no gains in time during diatom analysis, as the rarity of each species can only be established after the cell counts. Consequently, even though it is not a necessary step, the exclusion or rare species does not seem to hinder the multivariate analysis and may improve, even if slightly, the overall results.

Qualitative Analysis: Species Presence-Absence
Species composition studies only score the presence or the absence of taxa in each sample. High taxonomical resolution is maintained but there is no abundance data. This approach is much less time-consuming since scoring 300-600 individuals per sample is not necessary. Hence, its interest as a time-saving approach can be significant in large surveys. Our results showed that the species matrix based on binary data replicated well the several multivariate analyses outcomes and had a high correlation with the environmental variables dataset.
Taxonomical information alone seemed sufficient to spatially discriminate samples, but it had very weak temporal signal. This was anticipated, given that Ribeiro et al. (2013) concluded in that spatial differences were mainly brought about by changes in sediment texture, while temporal variations were mainly caused by seasonal shifts in abundance of a few dominant species from the mudflat assemblages and that the epipsammon-dominated assemblages had a stable structure throughout the 2-year study. The removal of abundance data eliminated the seasonal peaks of epipelic species and, thus, concealed the temporal patterns. The spatial patterns were retained because the taxonomic differences between the assemblages of each site were not affected. Therefore, it is to be expected that in large spatial surveys covering the salinity gradient of an estuary (e.g. Rovira et al., 2012) but also several sediment textures (e.g. Sabbe and Vyverman, 1991), qualitative data may prove to be sufficient to adequately describe the MPB communities. Likewise, it would also mean that if there are seasonal changes in silt content on a given tidal flat (e.g. Méléder et al., 2007), temporal variations are likely to be detected at the species level, even without abundance information. However, if a spatial study selects mudflats of similar sediment texture or grain size (Forster et al., 2006), or uses the lens-tissue method (Eaton and Moss, 1966) to selectively collect the epipelic fraction of the MPB (Thornton et al., 2002), spatial differences between diatom assemblages will mainly be caused by changes in the relative abundances of the most common species along the salinity and nutrient gradients (cf. Underwood et al., 1998). Therefore, the removal of abundance data would reduce much of those spatial differences in a similar fashion to what it did to the seasonal patterns of the mudflat assemblages of the Tagus estuary binary dataset.
Several authors advise against the use of binary data in monitoring studies. Lavoie et al. (2009) demonstrated that ordinations based on presence/absence data are only capable of gross separations between impacted and reference sites in largescale monitoring studies, while Thorne et al. (1999) showed that clustering of Bray-Curtis similarities and ANOSIM analyses on binary matrices performed poorer than any other transformation of the same data (i.e. original, root, and fourth-root transformed data). Giving the same weight to all taxa may have major consequences when the community patterns are dependent on the abundance of a few dominant species. The current study shows that information taken from multivariate analysis based on binary, qualitative data may be enough to adequately display spatial patterns. But special caution is advised in larger spatial surveys, given the importance of abundance data in low-diversity, epipelon-dominated mudflat assemblages in estuarine areas.
Finally, a practical recommendation: in order to achieve a species-richness that falls in the expected asymptote of the species accumulation curve for the customary 300-600 valve count, at least 50 ocular fields should be screened when following presence-absence approach.

Taxonomic Sufficiency: Genus-Level Analyses
Genus-level abundance dataset had an intermediate correlation with the original species dataset (OSM) while the presence/absence matrix has one of the lowest. The overall multivariate spatial patterns were retained and the correlation with the environmental variables set was strong, but both matrices were incapable of significantly discriminating sampling dates. These results indicate, therefore, that information was lost by the reduction in taxonomical resolution but that change in intertidal diatom community structure can still be reflected at the genus-level.
The effect of taxonomic resolution in multivariate community patterns has been widely tested in freshwater and marine macroinvertebrate community studies (e.g. Dauvin et al., 2003;Anderson et al., 2005;Heino, 2008) and more rarely in freshwater diatom lotic communities (Rimet and Bouchez, 2012a). High correlations between species and genus richness (Hill et al., 2001;Passy and Legendre, 2006) and/or assemblage structure (Heino and Soininen, 2007) have been reported but Lavoie et al. (2009) found that genus-level multivariate analysis was only capable of detecting gross differences between impacted and reference sites in Canadian streams, mirroring the effects of presence/absence data transformation or of excessive exclusion of rare taxa.
The current study suggests the genus-level taxonomic resolution could be considered as sufficient to detect changes in community structure in coastal and estuarine intertidal areas, in particular when it is mainly caused by shifts in sediment texture. It should be noted, however, that seasonal patterns in the mudflat assemblages (i.e. in V2 and V3) were reduced to changes in the relative proportions of, essentially, three genera (i.e. Navicula, Cylindrotheca, and Gyrosigma). When the abundance data are discarded, the temporal signal disappears altogether at the genus-level but not at the species level ( Table 3). As speciesspecific seasonal blooms  are not recorded, temporal changes may become undetected with a reduction of taxonomical resolution. Secondly in less diverse, epipelondominated assemblages, spatial differences brought by slight changes in sediment texture may not be detected at the genuslevel. Finally, some information is bound to be lost when dealing with ubiquitous, abundant genera such as Navicula, which had high beta-diversity. The analysis of the 29-species Navicula set showed that it had a stronger correlation with the full species set than the genera datasets, thus underscoring the importance of this genus and of species-level community analysis. This impressive performance is an excellent example of the amount of variation and information that still exists within a single genus and, therefore, why the lack of sufficient taxonomic resolution in community ecology studies may be problematic (Kociolek and Stoermer, 2001a;Kociolek, 2005).
As in the case of the species-level binary data, more subtle spatial and temporal changes may pass undetected with the genus-level approach (Somerfield and Clarke, 1995;Hill et al., 2001;Lavoie et al., 2009). Some authors (e.g. Bowman and Bailey, 1997;Thorne et al., 1999) consider that preservation of abundance data is preferable to the maintenance of high taxonomic resolution in qualitative data. In our study the genus abundance matrix had lower discriminative power (ANOSIM) and lower correlation with the abiotic data (BEST) than the species presence/absence dataset. Interestingly, when binary data are coupled with lower taxonomical resolution the results were not considerably worse, except for the above mentioned temporal signal. The genus-level presence/absence had lower discriminative power (ANOSIM) but higher correlation with the abiotic data (BEST) than its species-level counterpart, as well as the genus abundance dataset. This result is in agreement with studies on marine and freshwater macrobenthic fauna (e.g. Olsgard et al., 1998;Heino, 2008), although the reasons for it are not entirely clear.

Diatom Cell Size and Biovolume
The effect of diatom cell size was assessed in two different ways: one discarded the taxonomic information and rearranged the abundance data in four size-classes, the other maintained the maximal taxonomic resolution but transformed the abundance data in a percentage of contribution to total biovolume. Both yielded poor results and the two lowest correlations to the OSM. They failed to detect temporal changes but still scored average to relatively high correlations with the environmental set and were able to discriminate the sites.
The use of size-classes does have the obvious advantage that almost no taxonomic expertise is needed, and that biometric data are easy to acquire and inter-calibrate. However, it should be noted that the four size-classes used in this study were originally established from this very dataset (i.e. OSM) and their distribution clearly reflected differences between the sampling stations . Their applicability in other diatom distribution studies still needs to be tested. Nevertheless, the range of the four size-classes was not randomly chosen. A series of studies on the Baltic Sea epiphyton Snoeijs, 2002, 2003;Snoeijs et al., 2002;Ulanova and Snoeijs, 2006), indicated that diatoms smaller than 1000 µm 3 and diatoms bigger than 1000 µm 3 responded differently to environmental gradients, namely to salinity and exposure to wave action, and recommended that both size-classes should be counted and analyzed separately. As for the smaller diatoms, the commonly attributed dominance of small diatoms in European mudflats (e.g. Admiraal et al., 1984;Haubois et al., 2005;Sahan et al., 2007) should, in fact, be attributed to a medium-sized group (i.e. 250-1000 µm 3 ) which is mainly composed of several Navicula species . Finally, Ribeiro (2010) divided the <250 µm 3 group in two classes to stress the fact the very small diatoms (i.e. <100 µm 3 ) dominated the sandier sites (i.e. A1 and A2) but were not present in the mudflat ones, whereas the small diatoms (100-250 µm 3 ) size-class appeared both in sandflat and mudflat assemblages.
Studies by Haubois et al. (2005) and Lavoie et al. (2006) showed that both relative abundance and relative biovolume metrics obtained the same overall results but the explained percentage of species variance was higher with relative abundance data than with the relative biovolume metric. Haubois et al.'s (2005) study is particularly relevant because it was made on intertidal mudflat assemblages. It was able to show that, with this relative contribution to total biovolume, the epipelic assemblages were episodically dominated by large species. This temporal signal can also be perceived by a genus-level analysis (see above), as it is brought about by the species-specific blooms of large Pleurosigma and Gyrosigma species. Passy (2008) suggested that, in a lotic system, a small habit does confer resistance to disturbance of flow, grazing, and sinking and aids in dispersal, while a large habit is deemed advantageous in disturbance-free but nutrient-rich systems, where a tall stature provides a better access to nutrients and light, a greater surface for maximizing nutrient uptake rate, and a greater nutrient storage capacity. But she concluded that biovolume was more strongly related to density in the benthos than in the phytoplankton and that species distribution was a much more important descriptor of density at larger scales and a slightly better predictor than biovolume at local scales. In an intertidal system, dispersal seems less affected by cell size, as benthic diatoms of all types will be resuspended in estuaries and coastal areas (de Jonge and van Beusekom, 1992;Hernández Fariñas et al., 2017), but the size-class distribution will surely reflect sediment exposure, as smaller diatoms are spared from collision with sand grains, while larger ones are destroyed (Delgado et al., 1991). Sandflat assemblages are, therefore, invariably dominated by small diatoms (Asmus and Bauerfeind, 1994;Ribeiro et al., 2013), while in the mudflat assemblages the two size fractions of the motile epipelic group may well be a crude reflection of species-specific differences. They can also be playing, per se, a major role in biofilm cell micro-cycling and stratification, potentiating niche differentiation in epipelic biofilms, but the effect of size in MPB biofilm eco-physiology still needs to be explored. For example, in the particular case of the large species Gyrosigma fasciola, an inherent higher capacity of non-photochemical quenching (NPQ) seems to protect this species from high irradiances at lower temperatures, giving this species a competitive advantage in the winter months (Serôdio et al., 2005), but these features cannot be attributed to size alone.
The current study indicates that cell size may be a crude way to separate assemblages from different sites, given that the overwhelming effect of hydrodynamic stress and sediment texture in soft-bottom intertidal areas (Paterson and Hagerthey, 2001) is also reflected in their size-class distribution. As for the usefulness of relative biovolume metrics, the extra time spent in biovolume calculations (Hillebrand et al., 1999) added to the already lengthy taxonomical identifications without bringing any extra clarity to overall results. Therefore, the applicability of species relative biovolume or size distribution does not seem particularly promising.

When to Keep the Species Abundances Approach
The current study proposes and compares several less laborintensive options to the species abundances community analysis. It should be added, nonetheless, that some recent lines of MPB research do predicate on species abundances metrics or have much to gain when including them. This point is stressed by Underwood (2005), who highlighted how species composition influences biofilm function: not only there are significant differences in photosynthetic efficiency between epipelic diatom species (Oxborough et al., 2000), each taxon has its own migration pattern during a tidal exposure cycle, which in turn impacts the photophysiological response of individual cells but also of the biofilm as whole (e.g. Perkins et al., 2002;Paterson et al., 2003;Underwood et al., 2005). Moreover, Forster et al. (2006) and Vanelslander et al. (2009) depicted the effect of species richness and identity on epipelic biomass in both field and experimental conditions, while Barnett et al. (2015) showed clearly differences in the photophysiology of epipelic and epipsammic taxa. This latter example suggests that using growth-forms categories as surrogates to species-level analysis could be a valid alternative approach, but the allocation of diatom specimens to given a growth-form precludes a prior species-level identification in most cases and, thus, spending even more time and effort in assemblage description.
Microphytobenthos researchers should be aware that differences in biofilm function can be linked to both differences in species composition as well as to more conventional causes, such as nutrient concentration or photoacclimation (Underwood, 2005). For example, experimental settings that study the interactive effects of environmental variables on MPB biofilms (e.g. Cartaxana et al., 2015), or comparisons between light and O 2 microenvironments in natural intertidal sediments and their effect on biofilm photophysiology (e.g. Cartaxana et al., 2016) ideally require knowledge of species composition and their abundance (Underwood and Barnett, 2006). Furthermore, for research to be truly reproducible, peer-reviewed ecology journals should always include taxonomic information and how it was obtained (Vink et al., 2012). Unfortunately, it is not always possible to get this type of data and, hopefully, the current work provides a satisfactory way for MPB researchers to choose a valid alternative that is adequate to their studies objectives.

CONCLUSION
Many MPB ecologists lack satisfactory diatom identification skills or may have budget and/or time limitations that constrains them to complement their research with a sound description of taxonomic structure of the MPB biofilms they are studying. It is hoped that current work provides an objective way to evaluate and choose adequate surrogates to species-level diatom analysis in intertidal areas. All approaches tested in this study performed relatively well and replicated satisfactorily the multivariate patterns given by the species abundances dataset previously shown by Ribeiro et al. (2013), where very contrasting sediment textures were clearly reflected in differences in community structure, which were not only taxonomic but also in diatom size and functional group distribution. Even though all approaches were able to detect sharp environmental changes, there were enough differences between their multivariate routines' performances to establish a set of guidelines to be followed: (1) Species presence/absence and genera abundances approaches were the most promising alternatives, with the latter performing slightly worse than the former. However, if a fast and crude spatial distinction of tidal flat assemblages from contrasting sediments is needed, probably the most adequate choice is a genus-level abundances matrix. This approach retains some of the abundance information and does not need great identification skills.
(2) Seasonal variations can only be confidently detected with species-level abundances. Species-specific blooms are responsible for the temporal shifts in the mudflat diatom assemblages and they disappear when using binary or genus-level datasets. Hence, neither the qualitative nor the genera approaches are advised in the case of temporal studies. (3) The utility of size-classes as a surrogate classification seems limited, as it may not detect important environmental shifts, such as slight differences in sediment texture. However, their suitability in detecting nutrient gradients or light regimes should be further explored, as the sizeclasses presented here may well correspond to different niches in the epipelic assemblages usually found in mudflat environments. (4) The weighted relative biovolume approach should be avoided. This method is very time-consuming and failed to detect important differences in the structure of assemblages collected from different sediments.

DATA AVAILABILITY STATEMENT
This article contains previously unpublished data. The name of the repository and accession number(s) are not available.

AUTHOR CONTRIBUTIONS
LR, BJ, and VB designed the fieldwork and outlined the sampling strategy. LR and BJ performed the sampling and laboratory analyses. LR, LB, and TH-F analyzed and interpreted the data. LR wrote the manuscript. All authors read, critically revised, and approved the final version of the manuscript.