The Interpretation of Particle Size, Shape, and Carbon Flux of Marine Particle Images Is Strongly Affected by the Choice of Particle Detection Algorithm

In situ imaging of particles in the ocean are rapidly establishing themselves as powerful tools to investigate the ocean carbon cycle, including the role of sinking particles for carbon sequestration via the biological carbon pump. A big challenge when analysing particles in camera images is determining the size of the particle, which is required to calculate carbon content, sinking velocity and flux. A key image processing decision is the algorithm used to decide which part of the image forms the particle and which is the background. However, this critical analysis step is often unmentioned and its effect rarely explored. Here we show that final flux estimates can easily vary by an order of magnitude when selecting different algorithms for a single dataset. We applied a range of static threshold values and 11 different algorithms (seven threshold and four edge detection algorithms) to particle profiles collected by the LISST-Holo system in two contrasting environments. Our results demonstrate that the particle detection method does not only affect estimated particle size but also particle shape. Uncertainties are likely exacerbated when different particle detection methods are mixed, e.g., when datasets from different studies or devices are merged. We conclude that there is a clear need for more transparent method descriptions and justification for particle detection algorithms, as well as for a calibration standard that allows intercomparison between different devices.


INTRODUCTION
Optical measurements of particles in the ocean are rapidly establishing themselves as powerful tools to investigate ocean biogeochemical cycles and food webs (Lombard et al., 2019;Giering et al., 2020). One research area that has greatly benefited from the use of underwater camera and optical sensors is the ocean carbon cycle -specifically the biological carbon pump. The biological carbon pump describes the collective processes that transport organic carbon from the surface ocean to depth, for example, through sinking particles (Volk and Hoffert, 1985). Vertical profiles of particle images can elucidate the processes that determine particle size, type and distribution. Combined with information on carbon content and sinking velocity, particle profiles can provide high-resolution information on carbon fluxes and hence ocean carbon storage (see review by Giering et al., 2020).
A big challenge when extracting individual particle properties from camera images is determining what portion of the image is a particle and how big it is. Particle size is a crucial parameter as it is used as input for various conversions, in particular, to estimate particulate organic carbon (POC) content and sinking velocities (e.g., Alldredge and Gotschalk, 1988;Alldredge, 1998;Laurenceau-Cornec et al., 2015). The shape of a particle (e.g., how round or solid a particle is) can inform about the particle's density, drag and type (e.g., Laurenceau-Cornec et al., 2015).
A key optical processing decision for particle detection and sizing is the algorithm used to decide which part of the image forms the particle and which is the background. A commonly used technique is to apply an intensity threshold. A threshold is typically a gray-scale value for transformation of the image into a black-white (or "particle non-particle") binary field on which particle statistics can be calculated. To date, there is no standard procedure in determining the threshold and a wide range of algorithms exist, most of which calculate a threshold value based on the gray values of all pixels in the image (e.g., based on the histogram of pixel values). Users often apply the default threshold algorithm that is provided as part of their favorite image analysis toolbox. As image analysis toolboxes often offer a range of threshold algorithms, different users may use different threshold algorithms even when using the same toolbox. As the threshold choice is often only a simple click in a lengthy sample and data analysis sequence, it has received little attention and is often left unmentioned in the method description. For example, for one of our papers (Giering et al., 2016), Giering photographed marine snow aggregates collected using the Marine Snow Catcher and analyzed these using ImageJ. Not knowing better at the time, she used the default threshold algorithm (a variation of the IsoData algorithm) without explicitly stating this in the methods. Durkin et al. (2015) took photos of marine snow particles in gel traps and also analyzed these in ImageJ, albeit using the threshold algorithm "Intermodes." Many other studies also mention the imaging toolbox but do not explicitly state their algorithm choice (e.g., for ImageJ: Grossart et al., 2006;Lyons et al., 2007;Zhao et al., 2017;Flintrop et al., 2018).
To complicate the issue further, with increasing computing power, more sophisticated algorithms to detect particles are becoming common, such as edge detection, ridge detection, and even object detection using machine learning. Edge detection searches through an image and identifies areas of high frequencies, i.e., large variances in pixel values, and traces the outline of an object (Basu, 2002). Ridge detection algorithms work similarly to edge detection algorithms but, rather than detecting edges, they have been optimized to find lines. Object detection algorithms, using supervised learning, use previously labeled images to learn the visual characteristics of known objects in order to identify similar objects in new data (Zhao et al., 2019). For marine snow and zooplankton imaging, edge detection algorithms such as the Sobel and Canny algorithms have been proven to be useful (Sosik and Olson, 2007;McDonnell, 2011;Thompson et al., 2012;Yu et al., 2016;Ohman et al., 2019).
In this study, we explore how the choice of threshold or edge detection algorithm affects the final estimate of particle size and, further down the line, carbon content and flux. To do so, we used vertical particle profiles collected by the LISST-Holo deployed during the UK COMICS (Controls over Ocean Mesopelagic Interior Carbon Storage) programme . These deployments supplied us with plenty of imaged particles to run a series of particle detection experiments. Imaged particles ranged from ∼0.01-3.5 mm in diameter and thus covered the typical range of particle sizes that make up the bulk sinking particle fluxes (e.g., McDonnell and Buesseler, 2010). The choice of device is secondary and our experiments could (and should) be carried out on images taken by any camera system.
Our main questions were: 1. How does the choice of algorithm influence particle size (and hence POC content and flux estimates)? 2. How does the choice of algorithm influence particle shape and structure?
To address these questions, we first implemented a sophisticated segmentation routine. This step is common practice as imaging devices often take images of a water volume that can contain several particles surrounded by "empty space, " Segmentation isolates individual particles (hereafter referred to as "vignettes"), allowing particle-specific analyses whilst decreasing file sizes. Though our segmentation routine includes threshold algorithms, we do not explore the effect of thresholding on segmentation as it is not the purpose of this study. Once segmented, we analyzed each vignette using a range of static threshold values and 11 different algorithms (seven threshold and four edge detection algorithms). We compared the final estimates of particle size and shape and evaluated the effect on final POC estimates.

Site Description
Vertical profiles of particles were imaged using the LISST-Holo (Sequoia, US) in two contrasting open ocean sites as part of the UK COMICS programme . The first cruise investigated the diatom bloom in a highly productive region downstream of South Georgia in Nov/Dec 2017 (cruise DY086), the second cruise targeted the low oxygen regions offshore of Namibia in the Benguela current in May/Jun 2018 (cruise DY090). We present data from nine profiles in total: six profiles from the South Georgia sites (cruise DY086) and three profiles from the Benguela current (cruise DY096).

Particle Imaging
The LISST-Holo (Graham and Nimmo Smith, 2010) was mounted on the "Red Camera Frame" (a modified lander platform) and operated as a stand-alone system with an external battery pack. For each profile, the Red Camera Frame was deployed to (∼230 m taking a holographic image with a volume of 1.86 cm 3 every 1.2-2.5 m. The holographic image records the interference pattern caused when a collimated light beam (658 nm solid state diode laser) passes through a water sample. Objects in the water sample cause light to scatter and results in a unique interference pattern containing information about the size and position of the object. Using this interference pattern, objects can be reconstructed using the supplied software, HoloBatch 1 , producing in-focus monochrome images with a resolution of 4.4 µm per pixel and a frame size of 1600 × 1200 pixels.

Image Reconstruction
The reconstructed images were processed with slight adjustments to the default parameters to produce sharper and more detailed reconstructions at the cost of greater memory requirements and computation time. These included setting the step size to 0.05 mm to sample the volume at smaller intervals, and setting the clean stack parameter to 5% in order to remove a portion of pixels potentially contributing to noise. HoloBatch was run on a machine with 64 GB RAM and an Intel Xeon E5-2623 v3 CPU (3.00 GHz).

Particle Segmentation
The reconstructed monochrome images generated using the HoloBatch software were typically composed of multiple 1 sequoiasci.com/product/lisst-holo particles. To perform analysis on each individual particle we first performed segmentation and produced cropped images around each detected region. HoloBatch offers the functionality to extract these vignettes. The detection of particles is performed during image reconstruction (Graham and Nimmo Smith, 2010;Davies et al., 2015), which is a highly computational and timeconsuming process. Depending on the parameters, vignettes may not contain a single particle but rather multiple particles, or multiple closely connected particles could be regarded as a single particle. Here we applied our own workflow for generating vignettes from the reconstructed images in order to have more control over the segmentation process.
Segmentation was achieved using our purpose-built Python package, Planktonator 2 . The workflow involved six key stages (Figure 1): Otsu thresholding, low pass filtering, thresholding and flood fill, contour calculation, particle measurement, and finally particle filtering and extraction. Please note, even though our segmentation routine included thresholding, we are not investigating the effect of thresholding on segmentation in this manuscript and it is therefore not further explored. The reason for the sophisticated routine was the complex nature of the imaged particles. Many of the particles were large phytoplankton chains or colonies made up of clearly identifiable individual cells (Figure 2). A simple segmentation routine would separate these large aggregates into artificially small particles, introducing problems for both sizing and future classification. We therefore optimized our segmentation routine for our dataset. Other instruments may apply their own segmentation routines or do not require any (e.g., for photos of individual particles).
We first used Otsu (1979) to convert the monochrome input image into a binary image by minimizing the intra-class variance between the background (water) and foreground (particles), leaving the most prominent pixels with a new pixel intensity of 0. Otsu is also applied as part of the HoloBatch software (Graham and Nimmo Smith, 2010;Davies et al., 2015) and is one of the most commonly referenced thresholding techniques (Sezgin and Sankur, 2004). In the following step a low pass filter with a kernel size of 25 and linearly decreasing spatial weights was applied to produce a blurred non-binary image. This step was used to connect neighboring pixels while maintaining the overall particle shape. The proximity of neighboring (particle) pixels also determines the intensity of pixels in the filtered image; the intensities of isolated and detached pixels (i.e., noise) are reduced while the intensities of tightly compacted pixels are maintained. A threshold was then applied to the filtered image to convert back to a binary image. A lower threshold reduces the detected particle area while a higher threshold increases it. We manually defined this value to be 0.96 (in the range [0,1]) as this produced satisfactory results after visual inspection. The result is an image composed of black regions corresponding to particle areas. As it is possible that a number of these areas contained gaps/holes, we filled holes using a flood fill operation before proceeding to the next stage. Contours were calculated from each particle region using scikit-image 3 . After obtaining the contour positions we 3 scikit-image.org measured each particle to determine their major length. Only particles with a minimum major length of 40 pixels (176 µm) were used for further analyses, as smaller particles typically lacked sufficient detail to be identified. Note that this size threshold was only applied during the segmentation process. The final step was to produce image crops around the detected contours regions. These crops contain data from the original monochrome images generated by HoloBatch and therefore have not been processed by thresholding. The objective of this workflow was to produce vignettes containing a single marine snow particle, and parameters were defined carefully through visual inspection. Nonetheless, without ground truth data, we were unable to verify the outputs objectively and some vignettes may still contain multiple particles.

Thresholding
For size measurements, vignettes were analyzed using Python and scikit-image (see text footnote 3). Vignettes were 8-bit monochrome PNG files with gray values from 0 (black) to 255 (white). The particle in each vignette was identified using a range of static threshold values and 11 different algorithms (seven threshold and four edge detection algorithms). All algorithms used here are provided in the scikit-image filters and features modules. All vignettes (n = 6400) and their metadata including size measurements are provided in the Supplementary Material. Examples of the code can be found in the Supplementary Material and at github.com/brett-hosking/supplementary.
The application of a gray-value threshold is straightforward. A gray value between 0 (black) and 255 (white) is chosen, any pixel below the threshold is considered particle, and any pixel at or above the threshold is considered background. The result is a true-false (binary) image separating the particle and background (also referred to as a "mask"). Particle measurements can then be performed on the binary mask. We tested a range of static thresholds, i.e., we applied the same threshold value to all vignettes. Static thresholds ranged from 10 to 250 in 10unit increments. In addition, we applied seven histogram-based thresholding algorithms: Otsu, Isodata, Li, Mean, Minimum, Triangle and Yen. These algorithms find a unique threshold value for each vignette based on the image properties (the grayvalue histogram). A description and illustration for each of these algorithms is provided in Table 1.
Edge detection algorithms effectively analyze each pixel and its neighbors to detect changes in pixel intensity. A gradient mask is produced that depicts strong gradients as lighter pixels. Depending on the algorithm, the gradient mask is thresholded to retain only the strongest gradients ("edges"). Pixels corresponding to an edge are assigned a "true" value, whereas pixels that are similar to their neighbors are assigned the value "false." The resulting binary image hence traces the edges of the particle. Using a flood fill operation, all pixels within the detected edges are filled, producing the final binary mask. We applied four common edge detection algorithms: Canny, Sobel, Scharr, and Roberts. A description and illustration for each of these algorithms is provided in Table 1.
We used Otsu as the reference threshold algorithm as it is, as described above, used in the HoloBatch software (Graham and  Finds threshold that minimizes the intra-class variance Ref: Otsu, 1979 Threshold Isodata Takes an initial threshold and averages the pixels below and above the threshold. The averages of these two values are calculated. Threshold is incremented and the process is repeated until the threshold is larger than the composite average Ref: Ridler and Calvard, 1978  Edge Sobel Performs 2D spatial gradient measurements by performing convolution between two 3 × 3 kernels and the image Ref: Sobel and Feldman, 1973 Histograms show the distribution of gray levels from 0 (black) to white (255) for the original image (left). Red line indicates the threshold identified by the algorithm, which is applied to generate the thresholded image (right).
Nimmo Smith, 2010; Davies et al., 2015) and our segmentation routine. Moreover, Otsu is one of the most commonly applied thresholding techniques (Sezgin and Sankur, 2004), and it generally produced (subjectively) reasonable results.

Particle Size and Shape
Once the algorithms were applied to the vignette, we calculated the particle's size based on the pixel count for the following parameters: area, equivalent spherical diameter (ESD), convex area and Feret length (details in Table 2). The shape of the particle was explored in terms of roundness, i.e., how similar the overall particle shape is to a circle, and solidity, i.e., how densely packed the particle is ( Table 2). Both parameters range from 0 to 1, with a value close to 1 suggesting the particle is, respectively, nearly circular or solid. Please note that roundness was calculated using the area of the convex hull rather than the particle area. We chose this metric because many of the large particles were very loosely packed aggregates or colonies. For these cases, the traditional roundness metric (ratio of particle area to Feret length) would suggest that such particles are elongated even though, in reality, they were often round.

POC Content, Sinking Velocities, and Fluxes
POC contents of an individual particle (in µg C) was calculated from the particles volume (V hull in mm 3 ) using a modification of the equation by Alldredge (1998): where V hull is calculated from the hull area [A hull in mm; V hull = 4/3 π (A hull /π) 3/2 ] and S is the solidity of the particle ( Table 2). Hull area was chosen as metric for particle size as it is closer to the area metric used by Alldredge (1998). The original equation by Alldredge (1998) overestimated POC concentrations compared to POC concentrations measured using in situ pumps (53 µm mesh; data not shown), likely because many of the particles that we imaged were very loosely packed. We used the solidity metric (scaled to volume) to account for this discrepancy (i.e., we allowed particles to be porous), which produced an acceptable match between calculated and measured POC concentrations.
Fluxes were calculated by multiplying the POC content of a particle by its estimated sinking velocity. Sinking velocities (v in m d −1 ) were assumed to be a function of the particle's size (ESD hull in mm) following power law and solidity (S): The regression was based on in situ data collected during the COMICS cruises. Briefly, an underwater camera was fitted onto a neutrally buoyant sediment trap and programmed to take a fast sequence of images (deployment depth 500 m). Individual sinking particles (n = 244) were tracked through the images and a downward sinking velocity calculated. Most sinking particles were <1 mm in diameter (median 0.20 ± 0.26 mm ESD) similar to our observations (median 0.33 ± 0.47 mm hull-based ESD detected by Otsu). The regression fit was not strong (p < 0.01, R 2 = 0.08, n = 244) but generally matched previous observations ( Table 3). Particles at 500 m were more solid than particles observed in the upper 250 m of the water column, which likely

Source
Application range (diameter in mm) a b Alldredge and Gotschalk (1988)   Convex area px 2 A hull * Sum of all pixels enclosed by the convex hull. Envelope around particle (Think: rubber band fitted around the particle).
Feret length px F * Also referred to as maximum length. Maximum extend of particle. Calculated from the maximum distance between the corner points of the convex hull.

Roundness
Unitless R A hull /(π (F/2) 2 ) Ratio of the convex area and the area of a circle with a diameter of the particle's maximum length. The closer the value is to 1, the more similar the particle is to a sphere.

Solidity
Unitless S A part /A hull Ratio between area of particle and area of convex hull. The closer the value is to 1, the more solid a particle is.

Asterisks indicates measured values.
Frontiers in Marine Science | www.frontiersin.org reflected differences in excess density with less solid particles being less dense. We therefore applied the same solidity-based scaling as in Eq. (1). Calculated sinking velocities ranged from 0.2 to 46 m d −1 with a median of 8 ± 8 m d −1 , which seems plausible for loosely packed particles of a median size of 0.33 mm in hullbased ESD. Please note that the absolute POC concentrations and fluxes presented here are used primarily for illustration purposes to highlight the effect of different thresholds and threshold algorithms on final estimates. To compare particle profiles, mean ESD, POC concentrations, and POC fluxes were binned in 1-m bins. Variability was often high between adjacent bins making it difficult to compare the different algorithms. We therefore smoothed depth profiles by calculating a running mean and standard deviation (n = 11) for each depth. Hence, flux at 20 m is the mean of fluxes between 15 and 25 m. The flux attenuation rate ("b parameter") was calculated by fitting a power-law to the flux profile below 50 m (Martin et al., 1987).

Particle Profiles
The LISST-Holo imaged a wide range of particle types (Figure 2). The number of detected particles and particle concentrations varied markedly between the different deployments (Table 4), reflecting ecological differences between the profiles. Particle abundance was higher near South Georgia (0.5-3.3 particles mL −1 throughout the sampled water column) compared to the Benguela (<0.1-0.3 particles mL −1 ) ( Table 4). The highest number of particles was imaged near South Georgia on the 16 Nov 2017 (profile DY086-034) with a total of 2707 vignettes (2.6 particles mL −1 ). This high abundance allows us to explore changes in particle characteristics with depth. In contrast, in the profiles in the Benguela, we imaged only a total of 16, 23, and 149 particles, which precludes a meaningful exploration of vertically resolved fluxes.
All particle profiles showed similar depth trends with higher particle concentrations in the upper 100 m and typically <2 particles per mL deeper in the water column (illustrated for profile DY086-034 in Figure 3A). At the northern South Georgia stations, mean particle size (ESD) decreased with increasing depth while roundness and solidity increased (Figures 3B-D). These trends are also apparent at the Southern South Georgia and Benguela stations, though they are less clear owing to the low number of observed particles (data not shown).

Static Threshold
By applying a range of thresholds (10-250) and comparing ESD to that calculated using a threshold of 250, we can clearly see how important the threshold is for the final particle size and flux estimates. First of all, low static thresholds failed to "find" particles in many of the vignettes, i.e., there were no pixels below the threshold gray value and the size was returned as "0" (Figure 4A). Even the static threshold value of 220 returned one empty vignette (out of 6400 vignettes). Estimated ESD increased rapidly between threshold values 160 and 240 at a rate of 1 percentage-point per unit (relative to the ESD at calculated for threshold value 250). In other words, choosing 200 rather than 190 as threshold value resulted in a 10 percent increase in the ESD estimate (from 45 to 55% 250-threshold-equivalent ESD). Accordingly, estimates for sinking velocities, particle POC content and particle-specific POC flux followed the trend in ESD, with a rapid increase in relative values as the threshold value  increases ( Figure 4B). The additive effect of size on flux estimates resulted in a rapid increase in flux estimates with increasing static threshold ( Figure 4C).

Algorithm Choice
The different threshold algorithms calculated a range of thresholds, with median estimates ranging from 126 (Minimum) to 254 (Triangle) (Figure 5)  The choice of threshold algorithm had a marked influence on the estimates of particle size, carbon content, sinking velocity (data not shown) and flux for each vignette (Figure 6). Binary algorithms that detected low thresholds (Li and Minimum) estimated lower particle parameter values, while the algorithms that detected higher thresholds (Mean, Triangle, and Yen) estimated higher particle parameter values. Owing to the conversion steps, these differences magnified from ESD to POC content to POC flux for each particle (Figure 6). For example, the respective estimates using Mean (relative to Otsu) were (median) 112, 129, and 156% for ESD, POC content, and POC flux.
The four edge-detection algorithms all calculated larger particles with, accordingly, higher carbon content, sinking velocities and fluxes. In the order of ascending estimated particle size, the algorithms were Canny,Roberts,and,jointly,Sobel and Scharr (median ESD relative to Otsu 103,133,150,and 150%,respectively). The POC flux for an individual vignette estimated by Sobel and Scharr were up to 670 times that by Otsu. Even though this was an extreme, median flux estimates were still 6.8 times that by Otsu. Vertical flux profiles from the first station (DY086-034 at the northern South Georgia site) followed the expected decrease in flux from the surface to depth for all algorithms, though the overall flux magnitude differed greatly (Figure 7). For example, the flux at 100 m according to Otsu was 278 mg C m −2 d −1 (Figure 7). The algorithms Minimum, Li and Isodata estimated the flux to be 16, 74, and 93% of this value, respectively. On the other hand, the algorithms Yen, Mean, Triangle, Canny, Roberts, Sobel, and Scharr estimated the flux to be much higher at 118,153,153,173,245,318, and 318%, respectively. Hence, when using the Sobel or Scharr algorithms, particle flux at 100 m would have been estimated 886 mg C m −2 d −1 ; a carbon supply to the deep ocean of ∼600 mg C m −2 d −1 more (Figure 7).
In addition, the different algorithms gave slightly different vertical shapes. Fitting a power-law to the flux profile below 50 m (Martin et al., 1987) suggests that the flux attenuation rate ("b parameter") at the first station (DY086-034 at the northern  South Georgia site) varied from 2.29 to 2.96 depending on the flux algorithm (Minimum and Sobel/Scharr, respectively).
We observed a large effect of algorithm choice on median particle size (ESD), POC concentrations, flux estimates, and flux attenuation rate ("b parameter") at all sites ( Table 5).

Solidity and Roundness
Observed particles ranged in solidity from very loosely aggregated (S < 0.1) to solid (S > 0.9). The median solidity of the particle population was strongly dependent on the threshold algorithm (Figure 8). The highest median solidity was estimated by the binary algorithm Minimum, with a median solidity of 0.83. The edge detection algorithms Roberts, Sharr and Sobel diagnosed high solidities with a median of 0.48-0.56. Canny and the other binary algorithms calculated median solidity values between 0.30 and 0.36. The static thresholds below a gray value of 60 also diagnosed very solid particles, but owing to their poor performance (>75% of analyzed vignettes were "empty"), these are not further discussed.
There was no consistent relationship between particle size and solidity, though the density plots suggest two loose clusters (Figure 9). The most prominent cluster comprised of particles with a diameter of around 20 px (∼88 µm) and a relatively low solidity (S∼0.25). There was a second cluster of particles with a similar diameter but much rounder shape (R > 0.75). For all algorithms, particles with an ESD of >30 px (∼132 µm) appeared to become less solid as they got larger.
We next investigated the roundness of particles according to the different algorithms. The population median was similar for all binary (except Minimum) and edge detection algorithms, ranging from 0.36 to 0.42 (Figure 8). The Minimum algorithm estimated particles to be rounder (median: 0.53), while the static thresholds above 60 estimated particles to  Vertical profiles of mean ESD (in mm), POC concentration (in mg C m −3 ), and POC fluxes (mg C m −2 d −1 ; e.g., Figure 7) were calculated for all station using every algorithms. Data was binned in 1-m bins and a running mean (n = 11) calculated. The minimum and maximum value based on all algorithms is presented. "-" indicates that no particles were observed around this depth. be more elongated (median: 0.24-0.39). As static thresholds increased from 140 onward, the median roundness increased consistently (Figure 8). There was no clear trend between particle ESD and roundness (Figure 9). Please keep in mind, however, that the trends observed here might be a result of the optical device and its resolution rather than an overall ecological phenomenon.

Particle Size
Our error estimates are likely conservative owing to the LISST-Holo methodology: As the LISST-Holo uses diffraction pattern to reconstruct particles rather than conventional photography, the reconstructed particle images have relatively clear edges (Graham and Nimmo Smith, 2010). Conventional cameras often struggle with lighting artifacts such as halo-like boundaries around the particle (e.g., Farhadifard et al., 2017), refraction and semi-transparent edges. It is hence surprising and worrying that, even with well-defined particle edges in the LISST-Holo vignettes, the effect of the threshold algorithm choice on final size and flux estimates are significant. Particle size estimates appear to be highly sensitive to even small changes in threshold values. One advantage of threshold algorithms is that they are easy to understand and sensitivity analyses are uncomplicated. For example, to explore the effect on particle size, a defined number can be added or subtracted from the estimated threshold value before recalculating size. Edge detection algorithms are more complex and the results not as straightforward to evaluate particularly for aggregates with complex textures. These algorithms generally produced particle size estimates that exceeded those of the threshold algorithms. A possible reason for this phenomenon is the nature of the processing used to produce a binary mask from the detected particle outline, and the fact that edges in the binary image were not always clearly defined (i.e., pixels at edge boundaries were often noisy). This noise resulted in detected edges that appeared extruded, thus the binary mask covered a larger area. In addition, the Scharr and Sobel detectors also treat loose and closely connected aggregates as a single continuous particle, producing a mask that covers a large area (see example in Table 1). The Canny edge detector often produced edges that were not enclosed and therefore, when applying the flood fill operation, the masks only represented a portion of the visually determined particle shape. This partial filling explains why this algorithm produced smaller particle size estimates compared to the other edge detectors, though, perhaps surprisingly, the average particle sizes calculated by Canny were the same as those calculated by Otsu.
The effect of such nuances on final flux estimates is potentially huge, with our estimates by different algorithms varying by an order of magnitude. Our data highlights the importance of thorough calibration of the particle sizing routines and detailed method descriptions.
Researchers need to be very careful and considerate when choosing the particle detection and sizing algorithms. Simple visual inspection is a potential first step and may help to select a suitable algorithm for the particle type (e.g., compare raw and binary images in Table 1). However, algorithm choice by eye alone is not sufficient: both Li and Triangle appear to trace the example particle in Table 1 well, but their estimated area differed by ∼20% (1146 and 1386 px 2 , respectively). Rather, the sizing algorithm needs to be calibrated using external reference material. The developers of the Underwater Vision Profiler (UVP) calibrate each newly built UVP by deploying the new UVP alongside a "reference" UVP, which had been calibrated using natural marine snow and zooplankton (Picheral et al., 2010). While this is a good instrument-specific approach, it is not a feasible strategy for comparison between different instrument types and when comparing in situ and ex situ devices. Sosik et al. (pers. comm.) used a range of beads to fine-tune an edge detection algorithm for the FlowCytobot. When applied to phytoplankton cells of a known size-range, the new algorithm calculated much more realistic size estimate (unpublished data).

Particle Shape
The trend of particles to become less solid as size increases matches visual inspection of the vignettes. Many of the small vignettes contained relatively well-contained shapes, whereas many of the large vignettes depicted long diatom chains, such as Eucampia spp. (Figure 2G), and loose aggregates. This trend was the same for all threshold levels, suggesting -at face value -that the overall particle shapes were preserved (Figure 9). However, the median population solidity was different for the different thresholds (Figure 8), likely because an increase in threshold leads to the inclusion of more pixels, which filled in the "holes." The effect of including lighter pixels on particle size and shape is also indicated in the roundness of particles (Figure 8). Those algorithms with a higher median threshold estimated particles to be both larger and rounder. An increase in roundness was likely a result of, as for solidity, higher thresholds including more pixels, such as "peripheral" pixels.
The complex relationship between size, solidity and roundness highlights problems when reducing images to simple particle metrics such as ESD. Figure 2F, for example, shows a colony (likely the diatom Chaetoceros socialis) that forms large structures made of shorter chains of individual cells. As a consequence, this particle is very loosely packed with a large fraction of the particle within the convex hull being above the threshold (S = 0.20). The traditional roundness metric (area/F) suggests that this particle is very elongated (R trad = 0.10), equivalent to an ellipse with a width of 30 px. However, the effective width of the particle (width of ellipse with equivalent hull area and maximum length) is 154 px. Our adapted roundness metric, which uses the hull instead, suggests that the particle is reasonably round (R = 0.52), which matches the particle image much better. An inappropriate choice of particle shape descriptor that does not take into account the particle's solidity and/or an inappropriate choice of threshold settings will result in a wrong description of the particle population. Such distinctions are important for the interpretation and modeling of how particles behave, e.g., in terms of sinking velocity and aggregation/disaggregation.

CONCLUSION
We here show that the choice of threshold algorithm is critical in particle sizing and, consequently, POC concentration and flux estimates. The final estimates can vary by >20% when selecting different algorithms for analysing a single dataset. This uncertainty is exacerbated when different methods are mixed, such like the cases when datasets from different studies or even devices are merged.
The knock-on effects of "blindly" mixing data sets are likely most severe when the estimated particle sizes are run through a series of conversions, as was done here. (1) First, we imaged particles and estimated ESD using one method and threshold setting. (2) We then used a conversion from ESD to POC contents that was based on a different camera system with an unknown threshold setting and calibration.
(3) We estimated the sinking velocity based on data from a third camera system, again without clear knowledge of the image analysis routine and calibration. (4) Finally, we multiplied POC contents and sinking velocities to arrive at POC fluxes. Each step introduces uncertainties leading, potentially, to large errors in the final estimate of POC contents and fluxes. For example, before accounting for solidity when calculating particle POC content (Eq. 1), calculated POC concentrations based on particle images were up to an order of magnitude higher than measured concentrations.
The interpretation of particle shapes (e.g., solidity and roundness) is also affected. Image analysis routines that set a high threshold, and thus include more pixels, will not only estimate particles to be larger than those using a low threshold but also to have a different shape and density. Uncertainty in particle shapes has implications for data interpretation (e.g., density, sinking behavior) and particle modeling (e.g., aggregation/disaggregation; Burd and Jackson, 2009).
We conclude that it is important to make an informed decision on the particle detection algorithm when analysing particle images, and we therefore make the following recommendations: 1. The thresholding decision should be justified and clearly stated in the method section for each study. 2. The threshold choice should be based on calibrations with beads or natural aggregates of known size. 3. The equations used to describe particle size and shape should be clearly stated. 4. To allow merging datasets from different devices, all devices should be calibrated using the same calibration standard.
Currently, there are no calibration standards for particle sizing and no specifications on how to report thresholding. We therefore urge a collective effort to develop calibration routines and specifications to accurately describe particle size. Optical particle measurements are rapidly emerging as a major tool for understanding biogeochemical cycles and food webs in the ocean (Lombard et al., 2019;Giering et al., 2020). To fully leverage these exciting technological advances and the insights they provide, the research community needs a framework that encourages increased transparency in image analysis routines (including threshold choices) and allows merging data from the plethora of optical devices that are now used to explore the ocean.

DATA AVAILABILITY STATEMENT
All vignettes (n = 6400) and their metadata including size measurements are provided in the Supplementary Material. Examples of the code used to take particle size measurements can be found in the Supplementary Material and at github.com/brett-hosking/supplementary.

AUTHOR CONTRIBUTIONS
SG designed the study and wrote the manuscript with support from all authors. BH wrote the image analysis codes. NB contributed to the method development and interpretation of the results. MI contributed to the data. All authors contributed to the article and approved the submitted version.

FUNDING
This work was funded by the UKRI through the COMICS project (Controls over Ocean Mesopelagic Interior Carbon Storage; NE/M020835/1) and the National Capability funding. Contributions by NB were funded by a European Research Council Consolidator grant (GOCART, agreement no. 724416). This publication resulted in part from support from the U.S. National Science Foundation (Grant OCE-1840868) to the Scientific Committee on Oceanic Research (SCOR) and from funds contributed by National SCOR Committees.