Applicability of taxonomic sufficiency of macrofaunal assemblages in benthic ecological quality assessments: Insights from a semi-enclosed bay

To explore the applicability of taxonomic sufficiency in benthic ecological quality assessments, we analyzed the similarity of data matrices under different taxonomic levels and transformations based on macrofaunal data sampled from a semi-enclosed bay. The data matrix similarity revealed a highly significant correlation among the similarity matrices at the species level and those at the higher taxonomic levels (genus, family and order), while the correlation coefficients among the matrices decreased with increasing taxonomic levels. Second-stage CLUSTER plots showed that the quantitative genus level was the closest to the quantitative species level. The loss of family level data varied among seasons. The results of the response to environmental factors at the genus and family levels were similar to those at the species level. In the assessment of the benthic ecological quality of semi-enclosed bays, the level of macrofauna identification can be at the genus or family level to improve the cost performance, depending on logistical conditions.


Introduction
Accurate biological classification and identification techniques are required for obtaining accurate biological data when using organisms for environmental quality assessments (Cadotte et al., 2013;Gerwing et al., 2020). However, according to the requirements of traditional survey programs, it is usually necessary to perform survey sampling over a large area and using a large number of biological samples for classification and identification in a limited time. Therefore, problems such as a large workload, high professional requirements, and long-time consumption exist (Bilyard, 1987;Wu et al., 2012;Zhu et al., 2018). In this context, the use of higher taxonomic levels instead of the species level for environmental assessments was proposed. In 1985, Ellis introduced the concept of taxonomic sufficiency in pollution assessment studies (Ellis, 1985). Taxonomic sufficiency refers to the use of higher taxonomic levels (genus, family, etc.) instead of the species level to assess ecological quality status without losing statistical significance in environmental quality assessments. Taxonomic sufficiency does not require identification of biological samples at the species level, thereby not only reducing the level of expertise and workload required for identification, but also reducing the chance of taxonomic errors. In some studies, similar results were obtained when using data at the family or genus level instead of those at the species level when conducting environmental quality assessments (e.g. Terlizzi et al., 2009;Aguado-Gimeńez et al., 2015;Xiong et al., 2018). Some studies on environmental stress responses have shown that responses to environmental factors at the family or genus level are similar to that at the species level (Landeiro et al., 2012;Pitacco et al., 2019a). Many studies have concluded that the identification of organisms at the genus and family levels is sufficient (Chessman et al., 2009;Dauvin et al., 2016;Pitacco et al., 2019b), and one study even concluded that environmental quality assessments conducted at the family and genus levels are more accurate than those at conducted the species level (Checon and Amaral, 2017). Meanwhile, some researchers have questioned the reliability of taxonomic adequacy methods, arguing that information at the species level is lost when using higher taxonomic levels instead of the species level for environmental quality assessment, especially when there are large differences in the fouling tolerance of some macrofauna of the same family or genus. Only identification at the species level can accurately reveal the response of assemblage structure to environmental stress (Maurer, 2000;Elbrecht et al., 2017;Laini et al., 2020).
Considering the professional requirement to identify biological samples at the species level and the large workload required, many countries and regions worldwide did not have sufficient resources to conduct excessively detailed research (Li, 2015). Taxonomic sufficiency has been applied in many regions globally as a rapid method for conducting environmental quality assessments (Terlizzi et al., 2003). A study on heterogeneous tropical coastlines in India showed consistent spatial and temporal distribution trends at the order, family, genus, and species levels in polychaetes (Vijapure and Sukumaran, 2019). The taxonomic diversity of macrofauna in Xiamen Bay showed that the assemblage structure at the species, genus, and family levels was highly consistent, and there is less information lost at the genus level than at the family level (Zhu et al., 2018). A study on Guanabara Bay showed similar results when species were aggregated into genera and families (Soares-Gomes et al., 2012). A study conducted in the southern Gulf of Mexico found almost no loss of information for assessment of the benthic ecological quality when using the family level instead of the species level (Domıńguez-Castanedo et al., 2007).
Semi-enclosed bays with large areas of water and small waves are ideal for the development of port shipping and aquaculture. However, with increasing human activities around bays (such as sewage, aquaculture, port shipping, and tourism), pollution of semienclosed bays is a major concern (Peng et al., 2013). Macrofauna has often been used to assess the quality of marine benthic environments . Jiaozhou Bay is a typical semienclosed bay in the southern Shandong Peninsula. Historical studies on the macrofaunal assemblage in Jiaozhou Bay showed that the abundance and biomass of macrofauna in Jiaozhou Bay increased from the central region of the bay to the northern and southern regions, and that the bottom temperature and substrate characteristics were important factors influencing the distribution of macrofauna in the bay . Based on the macrofauna data from four seasons in Jiaozhou Bay, we analyzed the correlation between different taxonomic levels of macrofauna and their response to environmental stresses under different data transformations and explored whether higher taxonomic levels could have the same or similar assessment effects to those at the species level to provide a basis for the application of macrofaunal taxonomic sufficiency in benthic ecological quality assessments.

Study site
Jiaozhou Bay is located in the southern Shandong Peninsula and is a typical semi-enclosed bay and one of the earliest bays on which benthic ecology studies were conducted in China . Several rivers around Jiaozhou Bay, such as the Nanjiao and Dagu Rivers, constantly transport various nutrients into the bay, resulting in high biodiversity and productivity in the bay. The dense population around Jiaozhou Bay, river runoff inputs, point source pollution discharge, and other anthropogenic activities have aggravated eutrophication in this region.

Field sampling and laboratory analysis
Macrofauna sampling was conducted in Jiaozhou Bay in February, May, August, and November 2014 during four seasons (winter, spring, summer, and autumn), and a total of 14 stations (J1-J14, Figure 1) were set up for the survey, with the sampling-station map being obtained from Han et al. (2021) and Lu et al. (2021). The station locations were consistent for each season, while Station J6 was not sampled in spring and autumn because of weather conditions. Four replicate samples were collected using a 0.05 m 2 box-corer, sorted on board using a 0.5-mm sieve, and combined into one sample. Macrofauna and residues were then stored in 500-mL sample bottles, and preserved in 75% ethanol. Surface sediments were also collected at each station and brought back to the laboratory to determine the water content, organic matter content, median grain size, chlorophyll a (Chl-a), and phaeophorbide (Pha). The sediment grain size was measured using a laser particle size analyzer (Master Sizer 3000; Liu et al., 2007), and Chl-a and Pha were determined using fluorescence spectrophotometry and calculated using the correction formula of Wang (1986). The organic matter content was determined using the potassium dichromate-sulfuric acid (K 2 Cr 2 O 7 -H 2 SO 4 ) oxidization method. The water depth, bottom temperature, salinity, and pH at all stations were measured in situ using a YSI 600XLM Multi-Parameter Water Quality Sonde (YSI Inc., USA).
The macrofaunal samples were stained for 24 h in the laboratory using 1 ‰ Rose Bengal. After sieving and washing using a 0.5-mm sieve, all macrofauna specimens were collected, identified under a stereomicroscope, and weighed using an electronic balance with 0.0001 g sensitivity.

Statistical analysis
The multivariate analysis software PRIMER 6.0 was used to analyze the assemblage structure. The macrofauna speciesabundance data identified at the species level were taxonomically aggregated to the genus, family, and order levels to create a total of four datasets. The aggregation rate (j) (ratio of the higher rank taxon count to the species count) was calculated separately for the different taxonomic levels (Bevilacqua et al., 2012). The ratios were used to evaluate the consistency between species and higher taxonomic levels.
To analyze the relationship between the same diversity indices at different taxonomic levels, Pearson's correlation coefficients were calculated between the same diversity indices at different taxonomic levels using SPSS 22.0. Five data transformations (none, square root, fourth root, log(x+1), and presence/absence) were then performed on each of the four taxonomic-level datasets using the PRIMER software (version 6.0) to produce a total of 20 data matrices, thereby gradually reducing the contribution of dominant taxa. Thereafter, the Spearman rank correlations between the species level and higher taxonomic level Bray-Curtis similarity matrices under different data transformations were analyzed using the RELATE routine, and the corresponding correlation coefficients were calculated. The range of the correlation coefficient rho is 0-1, where rho=0 indicates that the two taxonomic levels are not correlated, and rho=1 indicates that the two taxonomic levels are highly correlated (Sun et al., 2021). To balance minimal information loss and least classification efforts, the cost/benefit ratio (CB L ) was calculated for all combinations of taxonomic levels and data conversion methods to select the appropriate taxonomic level according to Karakassis and Hatziyanni (2000).
where CB L is the cost/benefit ratio at taxonomic level L, r L is the Spearman correlation coefficient between taxonomic level L and the species level, t L is the number of taxa present at taxonomic level L, and S is the number of species. The ratio ranges between 0 and 1, where a lower value indicates higher cost-effectiveness of the treatment.
The above 20 data matrices were visualized using second-stage non-metric multidimensional scaling (nMDS) to visually represent the correlation between different taxonomic levels.
We selected the species, genus, and family levels converted using the fourth root transformation as quantitative data (Quan)  Map of the sampling stations in Jiaozhou Bay, China. Liu et al. 10.3389/fmars.2023.1130696 Frontiers in Marine Science frontiersin.org and those converted using the presence/absence transformation as qualitative data (Qual) for second-stage CLUSTER analysis according to the results of the nMDS plot (Xu et al., 2016), calculated the similarity between different matrices, and drew a CLUSTER plot to analyze the degree of information loss between assemblages at different taxonomic levels. BIOENV analysis was used to identify the best matching combination of environmental factors determining macrofaunal distribution patterns (Wang et al., 2017).

Univariate indices for different taxonomic levels
As shown in Figure 2, the Shannon-Wiener diversity index (H'), Simpson diversity index (D), Margalef's richness index (d), and Pielou's evenness index (J') at different taxonomic levels exhibited similar spatial and temporal trends, and the Simpson diversity index (D) and Pielou's evenness index (J') at different taxonomic levels were very similar among stations. Pearson's correlation coefficients were calculated between the same diversity indices at four different taxonomic levels: species, genus, family, and order, and it was found that the relationships between the same diversity indices at different taxonomic levels were all highly significantly positive (p<0.01). In general, higher taxonomic levels yielded lower values of H', D, and d. Diversity indices derived from different taxonomic levels were significantly lower at stations 2, 3, and 12 in summer and at station 2 in autumn.

Correlation analysis of taxonomic sufficiency
The results of the correlation of Bray-Curtis similarity matrices with different data transformations at different taxonomic levels for the four seasonal macrofauna assemblages in Jiaozhou Bay (Figure 3) showed that there were significant correlations (P<0.001) between the species level and higher taxonomic level similarity matrices; however, the correlation coefficients between the matrices decreased with increasing taxonomic level, regardless of the data transformation method. The correlation between species-and genus-level matrices was particularly high (Rho ≥0.965), independent of the type of data transformation applied. Data compression at the family level resulted in matrix similarity coefficients (Rho) ≥0.8 for the vast majority of family levels and corresponding species levels (except for the summer quadratic-root-transformed similarity coefficient, which was 0.797 and the summer presence/absence-transformed similarity coefficient, which was 0.622), which were large despite the gaps in correlation at the genus and species levels. However, the severe transformation type of presence/absence appeared to considerably lower the correlation values of order with the species matrices. In contrast, the speciesorder levels had the lowest similarity coefficients under all data transformations compared with the species-genus and speciesfamily levels. The CB L values at the order level and presence/ absence transformation were the lowest among the 20 data matrices for all taxonomic levels and data transformations (Table S1). In this study, the genus aggregation rate ranged from 0.87-0.91, the family aggregation rate ranged from 0.60-0.68 and the order aggregation rate ranged from 0.20-0.29 for the four seasons.
Each symbol in the second-stage nMDS ranking plot ( Figure 4) represented a similarity matrix, and the degree of similarity between the similarity matrices is represented by the distance between  Figure 4 showed, the taxonomic levels and data transformations occur in different directions of separation, indicating the independent effects of these two treatments on the data. In the second-stage nMDS ranking plot, similarity to the species-level matrix decreased as the taxonomic level increased from genus to order. It can be clearly seen that the genus level matrices are very similar to the species level matrices, while the order level matrices are the least similar to the species level matrices. It is worth noting that, in the data transformation, the distance between the fourth root and log(x+1) transformations was A B

FIGURE 3
Spearman correlation coefficient trends between the similarity matrices of different taxonomic levels (genus-species, family-species and orderspecies) with various data transformations (none, square root, fourth root, log (x+1), and presence/absence) in the four seasons of Jiaozhou Bay, China. significantly smaller (Figure 4) than the distance between other data transformations, indicating that the matrix after the fourth root and log(x+1) transformations are highly similar.
According to the second-stage nMDS plot, the highest similarity with the species level was at the genus level, followed by the family level, and the lowest similarity with the species level was at the order level. Moreover, it was found that the taxonomic levels with the lowest CB L values after the same data conversion only included the genus and family levels (Table S1). According to the second-stage CLUSTER plot ( Figure 5), during the four seasons, the quantitative species level was closest to the quantitative genus level, and the loss of information was less than 3%. The loss of information between the quantitative and qualitative species-level data was less than 8%. Comparing the quantitative data of the different taxonomic levels, the information lost from the genus level data was less than 3% compared with the species-level data, whereas the degree of information loss from the family level data varied greatly among seasons: 10.5% in spring, 20.3% in summer, 4.3% in autumn, and 7.4% in winter.

Correlation between macrofaunal assemblages at different taxonomic levels with environmental factors
The correlation between macrofaunal assemblages at different taxonomic levels with environmental factors was influenced by season, taxonomic level, and differences of quantitative or qualitative data (Table 2). Considering the best combination of environmental factors and the coefficients of correlation, it was found that there were remarkable seasonal differences in the response of macrofauna to environmental factors. Overall, the correlation between macrofaunal assemblages at the genus and family levels with environmental factors were similar to those at the species level. The correlation between macrofaunal quantitative data with environmental factors was higher than that of the qualitative data at the same taxonomic level. The correlation between macrofaunal quantitative and qualitative data at different taxonomic levels with environmental factors decreased with increasing taxonomic levels (except that the genus level was slightly higher than the species level in spring). The most correlated environmental factors with macrofaunal assemblages at each taxonomic level were the same in spring, while for the other three seasons, only similar environmental factors at the genus and species levels. The correlation between macrofaunal assemblages at the genus level with environmental factors was the highest in spring, and the correlation between macrofaunal assemblages at species level with environmental factors was the highest in the other three seasons.

Similarity of assemblage composition after data transformation
In this study, five different data transformations (none, square root, fourth root, log(x+1), and presence/absence) were used to Second-stage non-metric multidimensional scaling (nMDS) of the four taxonomic levels (species, genus, family and order) with various data transformations (0: none; sq: square root; fr: fourth root; log: log (x+1); pa: presence/absence) in the four seasons of Jiaozhou Bay, China. Corr. is the explanatory quantity for the combination of environmental factors. Quan represents quantitative data and Qual represents qualitative data. Environmental factors corresponding to 1, water depth; 2, bottom temperature; 3, bottom salt; 4, pH; 5, chlorophyll a (Chl-a); 6, phaeophorbide (Pha); 7, water content; 8, organic matter content; 9, sorting factor si; 10, median grain size; and 11, silt-clay content.

FIGURE 5
Second-stage CLUSTER plots based on the similarity matrices of different taxonomic levels (species, genus and family) with quantitative data converted using the fourth root transformation (Quan) and qualitative data converted using the presence/absence transformation (Qual) in the four seasons of Jiaozhou Bay, China. explore the applicability of taxonomic sufficiency in benthic ecological quality assessments using macrofauna. The fourth root and log(x+1) transformations were the closest among all those biological data transformations, and identification of biological data to higher taxonomic levels (genus and family levels) were both effective methods for assessing macrofaunal assemblages. Overall, as the transformation intensity and the classification become rougher, the CB L index values will increase.

Similarity of assemblage composition at different taxonomic levels
Among the four seasons, the correlation coefficients of the similarity matrices between the species levels and the higher taxonomic levels were significantly correlated (p<0.001), and the correlation coefficients of the similarity matrices between the different taxonomic levels decreased with increasing taxonomic levels, regardless of the data transformation method. The correlation between the same taxonomic level and species level changed dramatically because of season and data transformation, indicating that the assemblage structure at the genus level was the most consistent with the species level, and the assemblage structure of order level data was least consistent with the species level. The genus aggregation rate was higher than the family aggregation rate, which was much higher than the order level aggregation rate. A study showed that the consistency of assemblage structure between species and higher taxonomic levels increases as a result of higher rates of aggregation within assemblages, which leads to closer assemblage composition between taxonomic levels (Bowman and Bailey, 1997). Therefore, less information was lost at the genus and family levels compared to at the order level in Jiaozhou Bay. A study showed that the aggregation rate has a major influence on the effectiveness of taxonomic sufficiency and that the use of taxonomic sufficiency would work better in biomes with high aggregation rates (Oliveira et al., 2020).
The relationships between the transformed data matrices of different data at different taxonomic levels were further analyzed using second-stage nMDS plots. The closer the relative position, the stronger the correlation of the corresponding assemblage structures. Regardless of the type of data transformation, the distance between the genus and species levels was smaller than that between the species level and family level during all four seasons; therefore, the consistency between the species and genus level assemblage compositions was higher.
Further CLUSTER analysis of the quantitative and qualitative data selected from the nMDS plot revealed that the quantitative data at the genus level in Jiaozhou Bay had the lowest loss of information during the four seasons (all less than 3%), which was similar to results obtained from the Liaohe estuary and northern Jiangsu (less than 3% and less than 5%, respectively) and better than that in Xiamen Bay (8%). The loss of information at the family level varied widely (4.3%-20.3%) among seasons, similar to results obtained from Xiamen Bay (20%) and differing remarkable from those obtained from the Liaohe estuary and northern Jiangsu (less than 6% and 7%, respectively) (Wu et al., 2012;Zhu et al., 2018;Sun et al., 2021). The above results were consistent with the nMDS plot, indicating the genus level was the most consistent with the species level, and the correlation between family level and species level was also high on the whole of that year. The reasons of this phenomenon need further study. In terms of the quantitative and qualitative data, it was always the genus-level quantitative data that differed the least from the species-level quantitative data in all four seasons, while the genus-level qualitative data that were closest to the species-level qualitative data. It can be seen that the genus-and species-level data in the quantitative data of Jiaozhou Bay had very minor differences. However, the information is lost at the family level ( Figure 5). In summary, genus-and family-level data could replace species-level quantitative data to some extent.

Analysis of the correlation between macrofaunal assemblages with environmental factors
The correlation results between macrofaunal assemblages with environmental factors showed that the coefficients for the combination of environmental factors decreased with increasing taxonomic level except the spring. It has been suggested that, when environmental pollution becomes heavier, pollution-tolerant species increase together because of their tolerance to environmental pollution, whereas sensitive species decrease as they avoid the unfavorable environment. The higher taxonomic levels would respond more obvious to environmental factors (Olsgard et al., 1998). Although the family level aggregation rate in Jiaozhou Bay was large (j>0.6), our study was not consistent with the study in the Liaohe Estuary (Sun et al., 2021), possibly because the pollution was lighter and the majority of species belonged to sensitive species in Jiaozhou Bay. The correlation between macrofaunal assemblages with environmental factors in different seasons differed significantly, indicating that the optimal combination of environmental factors differed significantly among different seasons. Because Jiaozhou Bay is located in a warm temperate zone and has shallow water, the bottom temperature changes drastically. Thus, the response of macrofauna to bottom temperature is more remarkable in spring and winter, probably because the warming temperature in spring can promote the spawning, growth, and development of macrofauna larvae, while the sudden temperature drops in winter may inhibit the feeding and other behaviors of macrofauna (Quan et al., 2020). The response to Chl-a may be related to the seasonal variation in nutrient input. With rapid population growth and industrial and agricultural development around Jiaozhou Bay in recent years, industrial and agricultural wastewater, domestic sewage, and solid waste discharged into Jiaozhou Bay have increased annually . Nutrient concentrations in Jiaozhou Bay are mainly influenced by coastal inlet rivers (Dagu River, Licang River, etc.) and urban discharges, and the nutrient input from these land-based sources will lead to eutrophication in Jiaozhou Bay, which will, in turn, elevate the Chla content (Song et al., 2020). The response to Chl-a is more remarkable in spring and summer, which are the two seasons with relatively concentrated rainfall in Qingdao, and this will intensify the discharge of pollution. The water depth in Jiaozhou Bay is generally shallow, and human activities such as shipping and fishing are frequent, all of which will have influence on the state of the substrate. Therefore, the macrofauna assemblages respond remarkably to the indicators related to the substrate during the three seasons. Because the different physicochemical properties of Jiaozhou Bay in different seasons led to differences in the assemblage composition of macrofauna in the different seasons, the correlation between macrofaunal assemblages with environmental factors were not consistent in different seasons. The explanatory degree for the combination of environmental factors decreases with increasing taxonomic level and data transformation within the same season. This is because some information is lost as the transformation intensifies and the classification becomes coarser. In general, the correlation between macrofaunal assemblages at genus and family levels with environmental factors were similar to those at the species level to some extent. Therefore, the identification of macrobenthos can be extended to the genus and species levels when environmental monitoring is conducted in Jiaozhou Bay.
In studies related to biome structure and environmental responses, quantitative data are generally used to reflect the environment more clearly than qualitative data. However, quantitative data require detailed work and are susceptible to being influenced by the dominant species. In contrast, qualitative data only consider the presence or absence of species, and cannot be interfered by the dominant species, while being able to work better for rare and diagnostic species (Serrana et al., 2022). The combination of environmental factors identified by quantitative and qualitative data corresponding to taxonomic levels in the four seasons had high similarity, while the amount of explanatory environmental factors by qualitative data was lower than that explained by quantitative data in all cases. Therefore, we recommend that using species-level qualitative data instead of species-level quantitative data to reduce sampling efforts for largescale environmental monitoring surveys. However, this requires further study and discussion at larger spatial and temporal scales (Pires et al., 2021). In this study, only the qualitative data of different categorical order elements in spring are consistent with the quantitative data in terms of environmental response changes (Corr. and environmental factors), so the qualitative data cannot be generalized to all applications.
In conclusion, taxonomic sufficiency has been widely used as a simplified biological monitoring method for aquatic ecosystems (Warwick, 1988). The identification of macrofauna at the species level usually requires a high level of classification expertise and increases the rate of identification errors, and identification of macrofauna at the species level is not always feasible because of time constraints or economic costs (Furse et al., 1984). Studies have shown that the cost of identification can be greatly reduced by identifying the genus and family levels, and that information loss at the genus level is less than that at the family level (Ferraro and Cole, 1995;Baldo et al., 1999). Moreover, environmental changes mainly affect assemblage structure through species replacement. Paying too much attention to species-level identification is usually unnecessary in environmental quality assessments, and environmental changes can also be monitored at a higher taxonomic level. According to the results in this study, monitoring and evaluating the environmental quality in semi-enclosed bays, the genus or family levels were recommended to improve the cost performance.

Data availability statement
The original contributions presented in the study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Author contributions
JL contributed to the methodology, software, formal analysis, data curation, writing -original draft. ZX contributed to the investigation, data curation, editing, and funding acquisition. XL contributed to the conceptualization resources, writing -review and editing, supervision and funding acquisition. All authors contributed to the article and approved the submitted version.

Funding
This study was jointly supported by the National Natural Science Foundation of China (No. 41976131) and the Key Laboratory of Marine Ecology and Environmental Science and Engineering, State Oceanic Administration, China (No. MESE-2019-08).