Data-Limited Stock Assessment for Fish Species Devoid of Catch Statistics: Case Studies for Pampus argenteus and Setipinna taty in the Bohai and Yellow Seas

For many fish stocks, such as Pampus argenteus and Setipinna taty in China, size composition data are more accessible than catch data. Varied results can arise when different length-based stock assessment models are applied to these data, and fishery managers often need to reconcile conflicting estimates of population status. Superensemble modeling, a relatively recent innovation in fish stock assessments commonly used in other fields, may provide an effective solution to resolving uncertainties among the results from multiple length-based models. To verify potential for this approach to improve estimates of population status, we applied ensemble modeling to fit simulated data of P. argenteus and S. taty in the Bohai and Yellow Seas using predictions from a length-based integrated mixed effects (LIME) and length-based spawning potential ratio (LB-SPR) models as covariables in a superensemble model developed in this study. All simulation modeling of P. argenteus and S. taty in the Bohai and Yellow Seas was conducted using the operating model in the R package LIME. Initially, the LIME and LB-SPR performances were tested separately under three scenarios of fishing mortality and recruitment variability (“equilibrium scenario,” “endogenous scenario,” and “one-way base scenario”). Then, estimates of spawning potential ratio (SPR) were combined with the superensemble models (a linear model, a support vector machines, a random forest and a boosted regression tree). We trained our superensemble models with 80% of the simulated data and tested them with the remaining 20%. Our results showed that superensemble modeling substantially improved the estimates of SPR, with support vector machines performing the best at estimating population status: precision improved by 12.7% for S. taty and 8% for P. argenteus on average (namely, median absolute proportional error decreased by 0.127 and 0.08 on average) compared to the individual models. This finding has important implications for fisheries management in the context of species for which catch data are unavailable. Applying the size composition survey data, the results from support vector machines superensemble model suggested that neither S. taty nor P. argenteus in the Bohai Sea in 2019 are overfished, but the stock status of P. argenteus warrants vigilant monitoring.

For many fish stocks, such as Pampus argenteus and Setipinna taty in China, size composition data are more accessible than catch data. Varied results can arise when different length-based stock assessment models are applied to these data, and fishery managers often need to reconcile conflicting estimates of population status. Superensemble modeling, a relatively recent innovation in fish stock assessments commonly used in other fields, may provide an effective solution to resolving uncertainties among the results from multiple length-based models. To verify potential for this approach to improve estimates of population status, we applied ensemble modeling to fit simulated data of P. argenteus and S. taty in the Bohai and Yellow Seas using predictions from a length-based integrated mixed effects (LIME) and lengthbased spawning potential ratio (LB-SPR) models as covariables in a superensemble model developed in this study. All simulation modeling of P. argenteus and S. taty in the Bohai and Yellow Seas was conducted using the operating model in the R package LIME. Initially, the LIME and LB-SPR performances were tested separately under three scenarios of fishing mortality and recruitment variability ("equilibrium scenario," "endogenous scenario," and "one-way base scenario"). Then, estimates of spawning potential ratio (SPR) were combined with the superensemble models (a linear model, a support vector machines, a random forest and a boosted regression tree). We trained our superensemble models with 80% of the simulated data and tested them with the remaining 20%. Our results showed that superensemble modeling substantially improved the estimates of SPR, with support vector machines performing the best at estimating population status: precision improved by 12.7% for S. taty and 8%

INTRODUCTION
Stock assessment involves providing scientific, quantitative evaluations to objectively inform fisheries management (Hilborn and Walters, 1992). Globally, only about 50 percent of exploited fish species have been assessed due to limitations imposed by the data requirements of traditional or conventional stock assessment methods (Ricard et al., 2012;RAM Legacy Stock Assessment Database, 2018). Evidence-based fisheries management decisions rely on stock assessments to provide crucial insights into the status of fish stocks and fisheries. Due to an urgent need for managing an increasing number of smaller less productive or marginal fish stocks as well as halting depletion and promoting recovery of long established once highly productive stocks, in recent years, data-limited methods to estimate stock status and exploitation have developed rapidly (MacCall, 2009;Dick and MacCall, 2011;Free et al., 2017). Most of these methods can be divided into catch-based methods and length-based methods.
Not all the catch histories for various fish species can be obtained from fisheries statistical records. In many fisheries with limited monitoring capacity, it is often easier to collect length measurements from scientific surveys or catch sampling, such as for Pampus argenteus and Setipinna taty in the Bohai and Yellow Seas of China, than to quantify total catch. P. argenteus is a warm water fish species which aggregates to form a spatially clustered distribution of schools and is one of the main commercial species in China coastal waters (Liu et al., 1990). The P. argenteus population in China can be divided into the Bohai-Yellow Sea stock and the eastern China Sea stock (Liu et al., 1990). The Bohai-Yellow Sea stock has been the main target of trawl and mass drift net fishing for many years (Jin et al., 2005a(Jin et al., , 2006Tang, 2006). Setipinna taty is distributed in spatially clustered schools along the coast of China (Liu et al., 1990;Jin et al., 2006Jin et al., , 2015. In recent years, with the decline of traditional economically remunerative fish resources, S. taty has become one of the main targets of all kinds of commercial net fishing adjacent to the coasts of the Bohai and Yellow Seas (Jin et al., 2006(Jin et al., , 2015. Lengthbased assessment methods, which simply require mean length or length composition of the catch and estimates of life history parameters, have become increasingly prevalent for evaluating the status of data-limited or data-poor fish stocks (Thorson and Cope, 2014;Hordyk et al., 2015;Then et al., 2015), and have great potential for application in fish species assessment where catch data are unavailable.
Exploratory analysis of existing data and some background research (Rudd and Thorson, 2018;Pons et al., 2019) found that different results are obtained when applying different stock assessment models that rely solely on length measurements. Rational design of management strategies must consider the uncertainty in model outputs (Schnute and Hilborn, 1993) and fisheries managers must often reconcile conflicting estimates of population status and trend. Taking the average or weighted average of several model predictions, is one often tried and effective solution (e.g., Burnham and Anderson, 2002;Anderson et al., 2017). In population assessments where the input data and error assumptions are compatible, there are many successful examples in which model averaging has objectively combined the results of different models (Brodziak and Piner, 2010;Millar et al., 2015). However, some studies (Schnute and Hilborn, 1993;Anderson et al., 2017) have demonstrated that when model or data errors are incompatible, the most likely parameter values are not intermediary to conflicting values; instead, they occur toward one of the apparent extremes.
Superensemble modeling, often simply referred to as ensemble modeling, has been commonly used with success for climate and weather forecasting It provides a technical framework for drawing predictions from a group of models as inputs into a separate statistical model (Krishnamurti et al., 1999;Hamill et al., 2012). This technique has been used to improve estimates of population status by optimally leveraging multiple catch-based model predictions (Anderson et al., 2017). This approach may overcome the shortcomings of model-averaging methods and provide an effective solution for reducing the uncertainties that arise in the results from multiple length-based models to identify the most plausible combination of model parameters.
In fisheries without catch data or information on relative or absolute abundance, stock assessments typically use spawning potential ratio (SPR) as an alternative reference point to biomass at maximum sustainable yield (B MSY ) (ICCAT, 2017;Pons et al., 2019). The ensemble-method framework developed in this study will provide appropriate options to inform the management of fish stocks for which catch data are unavailable.

MATERIALS AND METHODS
To verify the potential of ensemble methods for improving estimates of population status, we fitted simulated data for stocks of P. argenteus and S. taty in the Bohai and Yellow Seas using predictions from a Length-based Integrated Mixed Effects (LIME, Rudd and Thorson, 2018) model and a lengthbased spawning potential ratio (LB-SPR, Hordyk et al., 2015) model as covariables in the superensemble models developed in this study, and compared their predictive performance against each other and their individual component models. LIME and LB-SPR are length-based methods that are commonly used in contemporary stock assessments, and are based on non-equilibrium and equilibrium principles, respectively. Both models require a minimum of 1 year of length composition data together with assumptions about growth, natural mortality, and maturity to estimate the spawning potential ratio (SPR) biological reference point, defined as the proportion of unfished reproductive potential at a given level of fishing pressure (Goodyear, 1993;Hordyk et al., 2015;Rudd and Thorson, 2018).
Firstly, a comparison was undertaken between the two methods using an operating model to produce data for simulated populations (P. argenteus and S. taty in the Bohai and Yellow Seas) under different length data groupings with three scenarios of fishing mortality and recruitment variability combinations ( Figure 1). All simulation modeling of populations of the two species were conducted using the operating model in the R package LIME from which estimates of SPR were obtained for each scenario. The performance of the ensemble models using the simulated datasets was evaluated by using 80% of the simulated data for training and the remaining 20% for testing. Finally, based on insights about the robustness of the methods, the stock status of P. argenteus and S. taty in the Bohai Sea were estimated (Figure 2).

Operating Models and Data Generation
Population estimates for simulation were generated by an operating model (all estimation was completed via the R package Template Model Builder, TMB, Kristensen et al., 2016) and developed in the LIME package (Rudd and Thorson, 2018;1 ) in the statistical software R (R Development Core Team, 2019). Rudd and Thorson (2018, Table 2 and Table 3 in their manuscript) described the population dynamic equations used in the operating model and functions for generating data. For natural mortality of P. argenteus and S. taty in the Bohai and Yellow Seas, the respective values ( Table 1) were assumed to be constant so that in each instance the median value of natural mortality was chosen from estimates made using multiple methods (the Barefoot Ecologist's Toolbox hosts a tool; Cope, 2017). For other biological and fisheries parameters of P. argenteus and S. taty, values were taken from local studies (Zhao, 1987;Liu et al., 1990;Tang and Ye, 1990;Chen, 1991;Jin et al., 2005a;Chen et al., 2018;Xu et al., 2019). Life history parameters for growth and length-at maturity are also listed in Table 1.
Levels of initial biomass depletion compared to carrying capacity of each simulated population were drawn from a uniform distribution between 0.35 and 0.95, and three scenarios involving combinations of fishing mortality and recruitment variability were explored (Figure 3): 1 | Life history parameters for each species assumed in the models; where L ∞ and K are the von Bertalanffy asymptotic length and Brody growth coefficient, respectively and L 50 is the length at which 50% of the population is mature.

Species
Max age ( (1) "constant" scenario: a constant exploitation rate and recruitment assumptions; (2) "endogenous" scenario: an exploitation rate coupled with biomass, and a Beverton-Holt spawner-recruit function (Beverton and Holt, 1957); and (3) "one-way" scenario: a Beverton-Holt spawner-recruit function and an exploitation rate coupled with fishing mortality changing linearly from the rate that would result in the randomly chosen initial depletion to F 20% which was calculated deterministically based on the biological information and selectivity associated with each lifehistory type (Rudd and Thorson, 2018; see text footnote 1).
Scenario (2) and (3) included a standard deviation for fishing mortality equal to 0.2 (Rudd and Thorson, 2018), a standard deviation of recruitment residuals equal to 0.737 and a firstorder autoregressive coefficient equal to 0.426, the mean of the predictive distribution from a meta-analysis of recruitment variability in global fish orders . Selectivity at length was assumed to follow a two-parameter logistic model (eq. 4 in the Rudd and Thorson, 2018), with an estimated parameter length at 50% selectivity and a second parameter representing the difference between length at 95 and 50% selectivity. We repeated this process running 100 iterations or replicates for each combination for each population to generate the simulation data. More information can be found on Github (see text footnote 1) and in Rudd and Thorson (2018).
A sample size of 250 individuals was assumed, which approximates the more realistic sample sizes of P. argenteus and S. taty measured annually in the Bohai Sea. Although our ensemble models were only fitted to estimated values obtained from individual length-based models with length data as the sole input, the performance of the LIME model with rich data assumptions was nevertheless tested. This helped to gauge the level of results that could be deemed acceptable when individual length-based models were fitted solely to length data. Therefore, a total of 13 data-availability scenarios for LIME were set up: (1): "Rich20" with 20 years of catch, abundance index, and length data; (2) "Rich10" with 10 years of total catch, abundance index, and length data; (3) "Rich5" with 5 years of total catch, abundance index, and length data; (4) "I_LC20" with 20 years of abundance index, and length data; (5) "I_LC10" with 10 years of abundance index, and length data; (6) "I_LC5" with 5 years of abundance index, and length data; (7) "C_LC20" with 20 years of total catch, and length data; (8) "C_LC10" with 10 s of total catch, and length data; (9) "C_LC5" with 5 years of total catch, and length data; (10) "LC20" with 20 years of length data; (11) "LC10" with 10 years of length data; (12) "LC5" with 5 years of length data; and (13) "LC1" with 1 years of length data. Only one data availability scenario was set up for LB-SPR: "LBSPR" with 1 year of length data.

Individual Length-Based Models and Testing Performance
Two individual data-limited length-based models that use simulated data and basic life-history and/or fishery parameters to estimate SPR were fitted. These length-based models were chosen because they can be fitted to a majority of fisheries around the world, are well established in the literature, and have been extensively simulation-tested (Rudd and Thorson, 2018;Chong et al., 2019;Pons et al., 2019Pons et al., , 2020Halim et al., 2020).
Length-based spawning potential ratio is one of the prominent length-based methods for estimating reference points in datalimited fisheries and enables a rapid assessment of stock status relative to unfished levels. This model assumes equilibrium conditions (invariant recruitment and mortality) by using a static, equilibrium-based relative age-structured model Prince et al., 2015). Since the status determination of the LB-SPR model is based on 1 year of data at a time, we only used the length data of the final year to calculate the SPR of that year. The inputs to the LB-SPR included the ratio of the natural mortality coefficient to the von Bertalanffy growth coefficient; length data in each year; the von Bertalanffy asymptotic length parameter; the lengths where 50 and 95% of the fish are mature; and length-weight parameters. LB-SPR estimates the ratio of fishing mortality to natural mortality and the lengths at 50 and 95% selectivity to best fit the predicted and observed length composition proportions, and derives SPR, outputting estimates for these four values for each year with length data . In this study, we used the LB-SPR package version 0.1.5 in R (Hordyk, 2017).
Length-based integrated mixed effects is an age-structured population dynamics mixed model and can account for variable fishing mortality and recruitment when there are only length data from a single year as well as assumed lifehistory parameters; including the length-at-age relationship, von Bertalanffy growth parameters, allometric length-weight parameters, natural mortality coefficient, and length at 50% maturity (Rudd and Thorson, 2018). It can also accommodate multiple years and types of data, including both length data and index data and/or catch data, in an integrated manner to improve the estimation of changes in fishing mortality over time (Rudd and Thorson, 2018). LIME estimates lengths at 50 and 95% selectivity to the fishing gear; the Dirichletmultinomial parameter; the recruitment standard deviation and annual fishing mortality coefficient as fixed effects; and estimates of annual recruitment as random effects. Compared with LB-SPR, when fitting length data from more than 1 year, LIME does not assume equilibrium conditions when recruitment is estimable. In our model runs, when the final gradient for all parameters was less than 0.001, the model converged. For each combination of life-history type, data availability scenario, fishing mortality pattern, and recruitment dynamics, we obtained 100 iterations of generated data and ran the estimation model for each set. In this study, we used the LIME package 2.1.3 version (Rudd and Thorson, 2018).
To assess the performance of the LIME and LB-SPR models under different scenarios we compared the outputs to the simulated "truth" from the operating models, and used the relative error (RE, [estimated-true]/true) as an evaluation indicator.

Building the Ensemble Models and Testing Performance
The individual models combined as ensembles were intended to provide estimates of recent stock status (SPR). Therefore, the ensembles were applied to estimate SPR in the last year of the data series, which was of interest for both of management and conservation purposes. The SPR were used in the last year as the response variable and the predictions from the individual models (LIME and LB-SPR) as predictors in our ensemble models (Figure 2).
A model average for each population and four ensembles of varying complexity were compared: a random forest (RF), a support vector machine (SVM), a linear model (LM), and a boosted regression tree (generalized boosted regression modeling, GBM). The four ensemble models were fitted for each population separately (e.g., RF_PA for P. argenteus only, RF_ST for S. taty only), as well as fitting the models for a set comprising data from populations of both species (e.g., RF for both P. argenteus and S. taty).
These models can be described in terms of θ , the ensemble estimated SPR. The individual model estimates of SPR are represented asx. The model average for each population was calculated as: The linear model for each individual population or both populations combined was calculated as: Support vector machines, based on mapping input space to a high-dimensional feature space where linear separation is easier than input space, provide a popular machine learning method and yet also represent a powerful technique for general (nonlinear) classification, regression and outlier detection with an intuitive model representation (Cortes and Vapnik, 1995;Bennett and Campbell, 2000;Chang and Lin, 2011). SVMs were initially used to train a data set (SPR in the last year of individual models fitting 80% of the iterations for simulated data under all fishing and recruitment scenarios) to obtain a model and secondly, the resultant ensemble model was used to predict information from a testing data set (SPR in the last year of individual models from fitting the remaining 20% iterations of simulated data under all fishing and recruitment scenarios). SVMs, based the pre-processing strategy in learning by mapping input space to a high-dimensional feature space, are used to find an optimal hyper-plane which maximizes the margin between itself and the nearest training examples in the new high-dimensional space and minimizes the expected generalization error (Seo, 2007). We fit SVMs with the e1071 package (Meyer et al., 2021) for R.
Generalized Boosted Regression Modeling (GBM), based on regression trees, is a type of regression model that is highly flexible, and has considerable success in predictive accuracy by maintaining a monotonic relationship between the response and each predictor (Friedman and Tibshirani, 2000;Ridgeway, 2007;Al-Mudhafar et al., 2016). GBM via the R gbm package (Ridgeway, 2007) was fitted using the default argument values.
Random forest (RF) modeling is also based on regression trees. In RF, each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest (Breiman, 2001). We fitted RF using the random Forest package (Liaw and Wiener, 2002) for R with the default argument values.
Repeated onefold cross-validation was used by randomly dividing the dataset into two sets, then building the ensemble model on eight-tenths of the data and evaluating its predictive performance on the remaining two-tenths. To assess the ability of the ensemble models to accurately and precisely estimate quantities of management interest i.e., SPR, the median absolute relative error (MARE) was used as an evaluation indicator to quantify precision between estimated and true SPR in the last year of data across the 20% iterations of simulated data.
Applying the Ensemble Models in the Assessment of Pampus argenteus and Setipinna taty in the Bohai Sea As an example, the LIME and LB-SPR models were applied to stocks of species of interest (i.e., P. argenteus and S. taty in the Bohai Sea) and then the resultant SPR estimates of stock status were used as data in the previously built ensemble models (i.e., RF, RF_PA, RF_ST, SVM, SVM_PA, SVM_ST, GBM, GBM_PA, GBM_ST, LM, LM_PA, LM_ST, and Model average). The Bohai Sea is an important spawning ground for the Bohai-Yellow Sea stocks of P. argenteus and S. taty thereby playing an important role in future years' stock recruitment (Liu et al., 1990;Jin et al., 2005aJin et al., , 2006Jin et al., , 2015. May to July is the spawning period for the Bohai-Yellow Sea stock of P. argenteus and July to November is the main feeding period. At the end of November, the stock gradually migrates toward the Yellow Sea for overwintering (Liu et al., 1990;Jin et al., 2005aJin et al., , 2006. In the Bohai Sea, S. taty has been the dominant species since the 1980s and plays an important role in the nearshore fish community (Jin et al., 2006). Setipinna taty spawn in the Bohai Sea from mid-May to June, and then return to the wintering ground in the Yellow Sea until the end of November (Liu et al., 1990;Jin et al., 2006Jin et al., , 2015. The application of ensemble models in this study was based on P. argenteus and S. taty length data that were collected by fixed-station bottom trawl surveys conducted in the Bohai Sea during spring (May), summer (August-September), autumn (October-November) and winter (December) of 2016-2019 by the Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences (Figure 4).
The modeling results can potentially inform a harvest strategy that targets a fishing mortality rate that is expected to result in 40% of unfished spawning output (termed "SPR 40% "), which is considered risk-averse for many stocks with very low resilience (Clark, 2002). SPR 30% is considered a threshold below which a stock is considered to have been overfished (Clark, 2002;Nadon et al., 2015;Rudd and Thorson, 2018). Therefore, these values were calculated as possible fishing mortality reference points to evaluate the status of the P. argenteus and S. taty in the Bohai Sea to inform future fishery management decisions.

Simulation Testing: Individual Length-Based Model Performance Across Fishing Mortality, Recruitment Variability, and Data Scenarios
Simulation testing demonstrated that the RE distributions of the LIME and LB-SPR methods under varying conditions were relatively dispersed compared with those under equilibrium conditions (Figure 5). In addition, the RE distribution for P. argenteus was relatively dispersed compared with that for S. taty. When the time-period of data becomes shorter, the bias (bias is measured as MRE) of the LIME estimator for both P. argenteus and S. taty will become larger.
Our results showed that for P. argenteus, both LIME and LB-SPR can estimate unbiased SPR when length data are available and biological characteristics are correctly specified across various scenarios of fishing mortality and recruitment  patterns (Figure 5). Compared with P. argenteus, the SPR estimated for S. taty by both LIME and LB-SPR had a large relative error. For P. argenteus, the mean bias in SPR of LIME and LB-SPR was 0.008 (from −0.005 to 0.017) across all data scenarios under equilibrium conditions (constant scenario). Under an endogenous scenario, the mean bias in SPR of LIME and LB-SPR for P. argenteus was 0.014 (from −0.026 to 0.085) across all data scenarios, and the SPR from LIME demonstrated relatively large deviations under scenarios with only length and catch data (C_LC). Under a one-way scenario, the mean bias in the SPR of LIME and LB-SPR for P. argenteus was 0.088 (from 0.039 to 0.163) across all data scenarios, and the SPR of LIME also demonstrated relatively large deviations under scenarios with only length and catch data (C_LC). In general, LB_SPR and LIME with length only data had a similar bias for P. argenteus.
For S. taty, the mean bias in the SPR of LIME and LB-SPR was 0.331 (from −0.176 to 0.405) across all data scenarios under equilibrium conditions. Under endogenous scenarios, the mean bias in SPR of LIME and LB-SPR for S. taty was 0.243 (from −0.146 to 0.379) across all data scenarios. But those RE distributions had relatively large deviations compared with those obtained under equilibrium conditions. Under a one-way scenario, the mean bias in SPR of LIME and LB-SPR for S. taty was 0.348 (from −0.217 to 0.576) across all data scenarios. In general, for S. taty, the LB-SPR estimate was lower than its true value, while the LIME estimate was higher than its true value. It seems that a persistent bias exists for both individual length-based assessment methods compared with the true values in the OM.

Performance of the Ensemble Models
Ensemble methods, and in particular the machine-learning ensemble models (support vector machines, SVMs), generally improved the estimates of stock status beyond those of any individual model ( Table 2). Compared to the LIME models with LC20, LC10, LC5, LC1, and SVM ensembles (SVM20, SVM10, SVM5, and SVM1) decreased the MARE (median absolute relative error to quantify precision) by averages of 0.257, 0.252, 0.243, 0.130, respectively for S. taty and −0.014, 0.004, 0.011, 0.036, respectively for P. argenteus. Compared to the LB-SPR model, SVM ensembles (SVM20, SVM10, SVM5, and SVM1) decreased the MARE by averages of 0.161, 0.150, 0.148, 0.136, respectively for S. taty and 0.021, 0.019, 0.011, −0.006, respectively for P. argenteus. Compared to the LB-SPR model, only SPR from individual models with the simulation of individual species (S. taty or P. argenteus) can be used to build ensemble models with higher accuracy than those fitted by both species in combination.

Assessments of Setipinna taty and Pampus argenteus in the Bohai Sea Using Ensemble Models
We applied ensemble models using the SPR from LIME and LB-SPR with the length composition data from S. taty and P. argenteus from the surveys conducted in the Bohai Sea during 2016-2019 ( Table 3). The optimal ensemble model (SVM) showed that the SPR estimates for P. argenteus and S. taty in 2019 were 0.388 and 0.525, respectively. These results (> SPR 30% ) indicated that there was no evidence of overfishing for P. argenteus and S. taty in the Bohai Sea. Notwithstanding this result, the SPR of P. argenteus (0.388< SPR 40% ) indicated that extra monitoring attention is needed, and further protection warranted to reduce the risk of this stock becoming overfished. Most ensemble models tested produced consistent results.

DISCUSSION
Applications of ensemble models were firstly applied to empirical data acquired from two data-poor Chinese stocks (P. argenteus and S. taty), which were then extended in a combination of equilibrium-based (LB-SPR) and relaxed equilibrium-based (LIME) length-based methods. As a mixed model, LIME attributed changes in the length composition data to variability  in recruitment, keeping selectivity constant over time (Rudd and Thorson, 2018). LB-SPR allowed selectivity to change between years, but assumed recruitment was constant over time . Combined with simulation testing, the empirical examples provide a pathway for application of these ensemble methods to more data-limited stock assessments of Chinese fish species for which catch data are unavailable. Application of the simulated dataset of known stock status showed that the individual models had variable success at estimating SPR in the last year of the data series, but with large deviations. Simulation testing comparing the performance of the LIME and LB-SPR methods in estimating SPR showed that LIME and LB-SPR performed better for P. argenteus (maximum age 6, asymptotic length (cm) 30.4, growth coefficient 0.25, and Natural mortality 0.817) across a range of life history, fishing mortality, and recruitment scenarios compared with that for S. taty (Maximum age 4, asymptotic length (cm) 20, growth coefficient 0.62, and Natural mortality 1.35). The results indicate that LIME overestimates SPR and LB-SPR underestimates SPR, thereby conferring poorer ability to estimate stock status for the rapidly growing, short-lived, small Engraulidae fish. Therefore, the conclusion of Rudd and Thorson (2018) that "with only length data, LIME performs well for the shorter-lived fish" which they inferred from studying only one species of short-lived fish needs further study for verification.
Ensemble methods provide a useful approach to situations where environmental resource management decisions must be made on the basis of multiple, potentially contrasting estimates of status (Anderson et al., 2017). Our results suggest that choosing a support vector machine ensemble model that allows for nonlinear relationships provides additional insight into individual model behavior and generally performed the best among all the models that we tested. However, not all ensemble models performed at their best in a variety of situations. The number of models included, and the structural forms of those individual component models, can greatly influence the performance of ensemble models (Ali and Pazzani, 1996;Dietterich, 2000;Tebaldi and Knutti, 2007). Ensemble models can exploit the best predictive performance of each of their individual component models which perform well under different conditions (Anderson et al., 2017). Whether the training dataset represents the dataset of interest will affect the performance of ensemble models (Knutti et al., 2009;Weigel et al., 2010), which is why we trained our ensemble models by considering the datasets produced by the various scenarios described in this paper. Inclusion of the dataset of interest in our training dataset, which improved the objectivity of the predictive performance of our models during their verification (Hastie et al., 2009).
Ensemble methods do not require theoretical information, or the further development of multi-model inference, because they can combine different types of models and model predictions via nonlinear functions that are tuned to known data (Stewart and Martell, 2015;Anderson et al., 2017). The results of our simulation testing illustrated that ensemble models can improve population status estimates of fish species in situations where catch data are unavailable. This is consistent with the results from ensemble modeling when compared with catchbased models worldwide (Anderson et al., 2017). The ensemble approach can provide a framework to enable managers to focus on decision-making rather than selecting from among assessment models or assumptions within those models (Stewart and Martell, 2015). The results also suggested that fundamentally different stock assessment modeling paradigms, LIME and LB-SPR, can be included in such an approach, and that it is more robust and convenient for assessing the stock status of Chinese fish species without catch data than attempting to select a single best length-based model. The challenges associated with diagnosing why different models (i.e., LIME and LB-SPR) yield different results, and the implications of these differences on management decisions is all the more reason for us to pursue ensemble modeling.
Over the past 30 years, many scholars have focused on the stock status of S. taty and P. argenteus in the Bohai Sea (for example, Tang and Ye, 1990;Jin et al., 2005bJin et al., , 2006. Based on the composition data of the fork length of S. taty from 1983 to 1985, Tang and Ye (1990) used a cohort analysis (Jones, 1981) to estimate that the resource utilization rate of S. taty in the Bohai and Yellow Seas is at an intermediary level of utilization. Compared with the 1980s, the biomass of P. argenteus and S. taty estimated from area-sweeping and acoustic survey methods has greatly reduced during the early 21st century (Jin et al., 2005b(Jin et al., , 2006. The concomitant limited availability of data has hindered the development of fish stock assessment and fishery management for both species. A lack of catch records for S. taty has limited the use of conventional assessment methods, so that this species cannot be assessed quickly and accurately. Recorded catches of P. argenteus have often included several related species, thereby confounding previous evaluations of its stock status (see Liu et al., 2013). Our application of ensemble methods by combining length-based models provides a solution which facilitates accurate evaluation of the state of stocks in which mixing of species in catches or other deficiencies in reported catches may occur. Through the application of survey data, the results from the ensemble modeling indicated that neither S. taty nor P. argenteus in the Bohai Sea was at an overfished level (of stock biomass) in 2019, but that the stock status of P. argenteus at an SPR <40% indicated that this species deserves special attention to ensure that it does not become more depleted. However, Jin et al. (2005bJin et al. ( , 2006 reported that the population structures of P. argenteus and S. taty have been trending toward younger cohorts of smaller fish during recent years (Jin et al., 2005b(Jin et al., , 2006. Results from the present study (SPR of 39% for P. argenteus and 53% for S. taty) when considered in conjunction with the previous work cited in this paper lend support to a risk-averse argument that managers should take measures to protect the stocks of both species to ensure their sustainable use and their role in the ecosystem (i.e., to increase the SPR of P. argenteus above 40% and maintain the SPR of S. taty well above 40%).
The ensemble modeling based on individual length-based models was initially used for the assessment of fishery species for which catch data were unavailable, but there is certainly ample scope for improvement. Ensemble models provide scope to combine a larger number of length-based models with differing structures and assumptions (Stewart and Martell, 2015) to explore whether these combinations can further improve the estimation performance of the ensemble.
Performance of ensemble models could be improved by optimizing the estimation of their individual component models. For LIME, this means collecting more years of length data; taking more independent length measurements during each year of length data collection; and conducting more surveys to acquire monitoring index data from which to derive estimates of variability in fishing mortality and recruitment. Sampling costs can be weighed against model performance to ascertain an appropriate number of fishery independent length measurements.
The impact of mis-specifying parameters should also be considered during future application of these models. A next step for ensemble models would be to apply Bayesian priors on biological parameters to more thoroughly represent the uncertainties in population parameter estimates relevant to management (e.g., from FishLife; Thorson et al., 2017). Sensitivity tests and likelihood profiles should be conducted on different levels of shaped selectivity to understand how SPR may become biased if the model structure is mis-specified.

DATA AVAILABILITY STATEMENT
The datasets that support the findings of this study are available from the corresponding author upon reasonable request.

ETHICS STATEMENT
The animal study was reviewed and approved by Animal Ethics Committee, Yellow Sea Fisheries Research Institute.

AUTHOR CONTRIBUTIONS
QH made substantial contributions to conceptualization, methodology, software, validation, formal analysis, data curation, writing -original draft, and visualization. XS made substantial contribution to conceptualization, methodology, data curation, and writing -review and editing. XJ, TY, and CS made substantial contribution to conceptualization, data curation, and writingreview and editing. HG made substantial contribution to conceptualization, writing -review and editing. All authors contributed to the article and approved the submitted version.