3D Surface Reconstruction of Plant Seeds by Volume Carving: Performance and Accuracies

We describe a method for 3D reconstruction of plant seed surfaces, focusing on small seeds with diameters as small as 200 μm. The method considers robotized systems allowing single seed handling in order to rotate a single seed in front of a camera. Even though such systems feature high position repeatability, at sub-millimeter object scales, camera pose variations have to be compensated. We do this by robustly estimating the tool center point from each acquired image. 3D reconstruction can then be performed by a simple shape-from-silhouette approach. In experiments we investigate runtimes, theoretically achievable accuracy, experimentally achieved accuracy, and show as a proof of principle that the proposed method is well sufficient for 3D seed phenotyping purposes.

Plant seed phenotyping is needed by seed banks for quality management e.g., concerning breeding purposes, linking to germination rate or plant growth. For this, 2D scanning is a popular, affordable technique (Herridge et al., 2011;Tanabata et al., 2012;Moore et al., 2013;Whan et al., 2014). Several commercial software packages are available for seed investigations using flat-bed scanners (e.g., Regent Instruments, 2000;Next Instruments, 2015). It has been applied to different seed types, like Arabidopsis, soybean, barley, or rice. Typically parameters like width, length, or area are calculated from the 2D images, but also more complex shape measures like Fourier descriptors (Iwata and Ukai, 2002;Iwata et al., 2010). However, to the best of our knowledge, no affordable 3D imaging technique has been presented so far designed for seed measurements. Correspondence-based techniques (Quan et al., 2006;Paproki et al., 2012;Pound et al., 2014Pound et al., , 2016Santos and Rodrigues, 2015) reconstructing 3D models from multiple images, or other low-cost techniques like laser scanning or the Kinect can be used for 3D whole plant reconstruction (Paulus et al., 2014) or root systems in transparent gel (Fang et al., 2009). However, such techniques are not suitable for much smaller objects like seeds of rapeseed plants (∼2 mm diameter) or even Arabidopsis seeds (∼0.2-0.4 mm length).
Here we investigate volume carving, a well-known shapefrom-silhouette technique (Martin and Aggarwal, 1983;Potmesil, 1987;Laurentini, 1994), for 3D seed shape reconstruction. It is a fast, reliable, and simple but robust method, having been used in plant phenotyping before, e.g., 3D seedling reconstruction (Koenderink et al., 2009;Golbach et al., 2015) or root system investigations (Clark et al., 2011;Zheng et al., 2011;Topp et al., 2013). Depending on the selected viewpoints it approximates the convex hull of an object or reconstructs even valleys and saddle-points, but cannot reconstruct true concavities. Most seeds are, however, relatively smooth, convex objects. For the seed types investigated here (Arabidopsis, barley, and maize, see Figure 1), true concavities seem to be of low relevance for volume estimation. For non-smooth seeds, like e.g., seeds of the plant parasites Phelipanche aegyptiaca, or Orobanche cernua the proposed method may be less suitable.
This paper is an extension of our conference publication (Roussel et al., 2015), thus theory (Section 2 and 3), and some experiments from Section 4 are mainly repeated from there. We extend the theory by an accuracy check and iterative camera position correction procedure, and the experiments by a numerical and experimental investigation of achievable accuracy vs. number of images in Sections 4.2 and 4.3. Further we updated references and discussion.

RECONSTRUCTING SEED SHAPE FROM SILHOUETTES
Aiming at relatively simple, mostly convex seed shapes, target voxel resolutions needed to describe such shapes are comparably low-as we will show in the experiments below, see Section 4. Therefore, for this study, it is sufficient to apply one of the most basic volume carving approaches.
We get the intrinsic camera matrix K (Hartley and Zisserman, 2004) and the distance between the origin of our working volume and the camera center from calibration (cmp. Section 3.2). The origin of the working volume is selected to be the tool center point (TCP) of the robot system handling the seeds, i.e., rotating them in front of the camera for imaging (cmp. Section 3.1).
We acquire N images, showing a seed under (equidistantly spaced) rotation angles α i where i ∈ {1, . . . , N}, see Figure 1. Rotation is around the vertical axis through the TCP, being parallel to the y-axis of the camera. We segment by gray-value thresholding each image into a binary mask M i being one at the foreground, i.e., seed and tool tip, and zero at background locations. Small objects like noise are removed and small holes (e.g., the reflection of the tool) filled.
For each image and thus segmentation mask we calculate the homogeneous camera projection matrix P i , from the rotation angle α i by where R i is the rotation matrix corresponding to the given angle α i , and translation vector t i is calculated using the distance of the world origin to the camera center, also known from calibration (see e.g., Hartley and Zisserman, 2004). By this, the world coordinate frame rotates with the object, i.e., the seed.
We define an equidistantly spaced, cubic voxel grid around the world origin, being large enough to contain the seed. The thus defined working volume depends on the seed type. For Arabidopsis we use (1 mm) 3 , for rapeseed (2.9 mm) 3 , and for barley and maize (13 mm) 3 .
Each voxel center with homogeneous world coordinates X is projected to a point x i in each mask M i by If X is projected to the background region of at least one of the N masks M i , then this voxel does not belong to the foreground object and its value V( X) is set to 0, i.e., Thus, if a voxel belongs to the foreground object, its value V( X) is set to 1. When higher voxel resolution is desired, and thus runtimes increase, parallelization of the carving algorithm (Brenscheidt, 2014) is feasible (see Section 4.1). Even higher resolutions become available on current desktop computer hardware, when hierarchically representing the voxel grid, e.g., as an octree (Szeliski, 1993;Klodt and Cremers, 2015).
One of the main drawbacks of this simple carving algorithm is its sensitivity to imprecise external camera calibration. When a mask M i is misaligned and thus does not well overlap with the "true" object volume, the non-overlapping parts are deleted from the volume without further testing or corrections. We therefore apply an image-based camera pose calibration step, as described next.

Correcting Camera Pose
Methods not adapting camera pose by estimating extrinsic parameters from the acquired images are known to be particularly sensitive to (extrinsic) calibration errors, thereby requiring precise positioning of the cameras (see e.g., Yezzi and Soatto, 2003). For relatively large objects in the multiple centimeter-range, say 20 cm long and filling most of an image, and typical pixel resolutions, say 2000 × 2000, a pixel covers an object area 0.1 × 0.1 mm 2 . Position repeatability of industrialgrade robotic systems, typically ≤ 20 µm and ≤ 0.05 • (Denso Robotics Europe, 2015), is therefore high enough for precise reconstruction. However, for objects being few millimeter in size or even in the sub-millimeter range additional care has to be taken. The mathematical TCP coordinates known to the robot control software may not coincide precisely with the physical TCP at the tool tip, due to mechanical calibration inaccuracies, wear and tear, or small deformations of the tool. In our case, instead of being at a fixed location in the camera images, the TCP moves on a more or less reproducible, elliptic trajectory of up to 200 µm diameter, varying with room temperature.
Before projecting the voxels to the mask images, we therefore adapt projection matrices P i . If a suitable non-changing target moving with the TCP is visible in all images, image registration can be done using simple normalized cross-correlation (see Figure 2).
In our robotic application, in order to adapt projection matrices P i , the truncated cone shape of the gripping tool has to be found, see Figure 3. As larger seeds may partly occlude the tool tip, we search for a region of the tool being reliably visible in the images. The tool enters the image vertically from below and becomes smaller in diameter toward the true TCP, being the center point of the very tip of the tool. As we can robustly find the tool's left and right edges, we apply a simple and very fast procedure. We calculate the visible width of the tool line by line starting at the bottom of the image, moving upwards, i.e., in negative y-direction. We iterate while the width decreases and is larger than the minimum tool width (being at the tip). The thus reached y-coordinate is taken as first estimate of the TCP y-coordinate y TCP . A reliable estimate of the TCP's x-coordinate x TCP is established as the mean of all found left and right edge x-positions. As the tool tip may be partly occluded by the seed, y TCP needs refinement. For this the left and right tool edges are independently tracked further until the narrowest point is reached, i.e., the rightmost point of the left edge, and the leftmost point of the right edge. The smallest y-value (highest point) of the two points is taken as new y TCP .
For small seeds like Arabidopsis this procedure works reliably, as the seeds are too small to occlude the whole tool tip in an image. For larger seeds, we use the observation that the TCP's elliptic trajectory results in its y-coordinates to describe a sinusoidal curve over the rotation angle. We therefore robustly fit a sin-curve to the y-coordinates and correct outliers according to the fit result.
For such small objects, the optical lens setup (cmp. Section 3) features a narrow opening angle (i.e., large zoom), like a microscope at 1-to-1 magnification. This means lines of sight are almost parallel and thus depth effects are negligible. This allows to update P i with x TCP by simply setting the principal point (Hartley and Zisserman, 2004) to x TCP .
In our experiments we observed that x TCP can be estimated reliably with pixel accuracy, when no disturbances like small dust particles are present. Maximum offset in locating TCP from an unoccluded tool tip was 2 pixel.
In situations, where larger inaccuracies in locating x TCP occur, testing consistency of results and correcting x TCP is recommended. Back-projection of the reconstructed 3d object is a simple procedure allowing to test whether or not P i is correct and the selected segmentation procedure is suitable. For this test, each surface voxel of the found 3d object is projected to a mask imageM i , initially filled with zeros. A voxel is projected toM i by projecting its corners toM i using P i and filling the respective convex hull with ones. If no errors occurred, the thus generated foreground mask should be identical to the segmentation mask M i (up to ignored subpixel effects when filling the convex hull, leading to a potentially slightly dilated maskM i ). Measuring overlap between the two masks can be done using well established measures, e.g., the Overlap Ratio Criterion = |M i ∩ M i |/|M i ∪ M i | (see e.g., Everingham et al., 2010), or Dice Similarity Coefficient = Dice (1945) and Sørensen (1948), where | · | denotes set cardinality. These measures are FIGURE 2 | Example provided together with source code as supplemental material. Imaging is done using affordable hardware, i.e., a usual SLR camera (Nikon D7000, AF-S Nikkor 16-85 mm 1:3.5-5.6 GED lens at f = 85 mm) and a motorized turntable (Steinmeyer DT130-360 • -SM01) for rotating the seed. As tool keeping the seed above the turntable's plane, we use a cut-off ball-pen-tip. A smaller tool, like in our robotic setup, allows for better seed visibility and reconstruction, however it is not easy to build. Even though the SLR is mounted on a sturdy tripod and released by a remote control, the camera center moves from image to image. In this setup, the base of the tool is visible in each image and can be used as target for correlation-based image registration. From the calculated image shifts, projection matrices P i are adapted accordingly. See provided source code for more details. (A) SLR image showing a maize seed, (B,C) cropped 760×564-image with 293×100 target for cross correlation indicated in red, (D,E) mask image with borders of the user-defined reconstruction volume projected back into the image in blue, (F) reconstructed seed and tool, (G) reconstructed seed with tool removed, (H,I), reconstructed seed from a more bottom and more top view.
used throughout image analysis and are common in plant imaging as well (Minervini et al., 2014).
In case the achieved mask overlap is too small but larger than zero, iterative procedures can be applied to increase accuracy. Straight forward is to (1) shift the principal point in each projection matrix P i such that the center of mass ofM i coincides with the center of mass of M i and (2) recarve, and iterate both steps until convergence or suitably large overlap. Alternatively, gradient descent-based algorithms optimizing camera pose may be applied as a refinement step (e.g., Yezzi and Soatto, 2003).

Removing the Tool from the Seed
For small seeds not overlapping with the tool, the TCP lies precisely in the world origin, i.e., the origin of the reconstructed voxel block. Thus, voxel above the TCP contain the seed, voxel below (which in that case we do not reconstruct) contain the tool. In cases where seed and tool may overlap (see e.g., Figure 3), the tool tip is also reconstructed. It can be removed from the volume data using its known position, orientation, and physical size by deleting the corresponding voxel volume.
Alternatively, at high voxel resolutions, where the reconstructed volume covered by the tool may be affected by noise, one can estimate the tool position from the reconstruction. Summing up voxel values of horizontal planes in the bottom region of the volume gives reliable estimates of the area of horizontal cuts through the tool. While the areas decrease when summing over higher and higher planes, the planes are deleted from the data. Then, when areas no longer decrease, using these areas, we estimate the y-position of the truncated cone using a least squares fit and remove the thus covered volume.

Imaging
Depending on seed size for 3D reconstruction we use two different setups for image acquisition. Both setups consist of an industrial-grade c-mount camera (PointGrey Grasshopper, GRAS-50S5M-C, Mono, 5.0 MP, Sony ICX625 CCD, 2/3", square pixels of size µ = 3.45 µm, global shutter, 2448 × 2048, 15 FPS), 35 mm high precision lens (Schneider KMP APO-XENOPLAN 35/1,9) and a white LED ring with diffusor (CCS LDR2-70-SW2) shown in Figure 4. For small seeds (e.g., Arabidopsis, tobacco, rapeseed) a 36 mm spacer is mounted between camera and lens. For larger seeds (e.g., barley, maize) only a 15 mm spacer is needed. Spacer reduce the minimum working distance of the lens (d = 69.9 mm for the 36 mm spacer, 128.0 mm for the other) and thus are responsible for suitable magnification. This allows to measure seeds in a range between ≈0.2 and 12 mm. White paper is used as background.
For image acquisition seeds are picked by a cone-shaped vacuum nozzle and held in front of the camera at optimal working distance using a robotic system to be described elsewhere. The robot rotates the seed in configurable angles and triggers the camera. We use 10 • steps and take 36 images, if not stated differently. Image acquisition times are mainly limited by the robot's rotation typically 2.7 − 5 s per 360 • , depending on the motion type. We perform 36 stop-and-go steps resulting in an overall acquisition time of ≈ 6 − 7 s.

Camera Calibration
We use the OpenCV implementation (Bradski and Kaehler, 2008) of Bouguet's calibration method (Bouguet, 1999) and an asymmetric 4 × 11 dot-pattern target with a total size of 5.8 × 4.3 mm. It was printed using a professional, high resolution film recorder, as usual office printers even on good paper do not achieve a printing precision suitable for camera calibration at such small spatial scales.
Using this toolbox, estimation of the focal length f is not precise enough for our purposes. We therefore use a ballbearing ball (steel, precision according to DIN5401 G20) with r 0 =1.50 mm ±0.25 µm radius as calibration object, in order to estimate the working distance d (or equivalently focal length f from working distance d) of our system precisely. From a mask image of the ball acquired with our system, we estimate its area A in pixel. This allows to estimate its radius r in the image by r = µ √ A/π , where µ is the pixel size. From basic geometric reasoning working distance d can be derived as d = f 2 + r 2 r 0 /r.

Software Implementation
The software framework is implemented in C++ on a Windows 7 operating system with Visual Studio 2013. The application programming interface Open Graphics Library (OpenGL OpenGL.org, 2015) was used for the GPU implementation.
As supplemental material (Roussel et al., 2016) we provide both, a suite of Matlab (Mathworks, 2015) routines as well as a Python implementation suitable for volume carving of not too small seeds using an affordable imaging setup. Such setups may e.g., use a turntable and a consumer SLR camera. In our example (see Figure 2) we use normalized cross-correlation for image registration, as the bottom of the tool is always visible as a suitable registration target.

Resolution and Runtime
The complexity of the volume carving algorithm is proportional to the number N V of voxels and number N of images acquired. For our equidistantly spaced cubic R × R × R grids the voxel number is N V = R 3 and thus complexity is O(R 3 N). In addition time for loading (or acquiring) the images (with N P pixels) and, for the GPU implementation, transferring the data to and from the graphics card is needed. Complexity of this data transfer and preprocessing of the images is O(N P N), or O(N P N) + O(R 3 ) for the GPU implementation.
Runtimes shown in Figure 5D have been measured on a PC with Intel Core i5-3470 CPU, 8GB DDR3 RAM and an NVIDIA GeForce GTX 580 GPU with 4047MB GDDR5 RAM (cmp. Brenscheidt, 2014 for further details). We observe that for low resolutions R of the voxel grid, runtime contributions by the O(N P N) components dominate, as no dependence on R is visible. For increasing R, these parts become negligible. While for the CPU implementation a significant increase of the runtime vs. the 2 s runtime for smallest voxel resolutions can be noticed at R = 256 (4 s), the parallel GPU implementation stays at comparable runtimes even at R = 512.
When interested in a seed's volume as a trait used for high throughput phenotyping, rather than in subtle surface details, voxel resolution can be selected comparably low. In Figures 5A-C we show a barley seed and its reconstructions together with its derived volume for different grid resolutions R. We observe that above R = 256 the estimated volume is approximately constant. Thus, for this phenotyping task, runtime is limited mainly by file-IO, transfer and preprocessing. Sophisticated speed-up mechanisms for the carving provide rather low benefits in this application, as their main potential lies in higher achievable volume resolutions.
Speedup using fewer images may be paid by lower accuracy (see Section 4.2). We show reconstructions of the same barley seed in Figure 5E and the corresponding volumes in Figure 5F. Images are selected equidistantly. We observe that reducing image number rapidly reduces reconstruction quality. Interesting to note is that the reconstruction using N = 9 images is more accurate than with N = 12 images. This is due to the fact that for N = 12 the selected angle between images is α = 30 • , thus 180 • is a multiple of α (the same is true for N ∈ {4, 6, 12, 18, 36}, cmp. also Figure 8 and Section 4.2). However, as the opening angle of our lens is small, complementary information content in masks coming from cameras looking in opposite direction is low. We conclude that for shorter runtimes with comparable or even higher reconstruction accuracy investigating alternative viewing directions is promising. We do this in the next section for the restricted possibilities of our robotic, turntable-like, single camera acquisition system.

Accuracy vs. Number of Images
We numerically investigate the influence of the number N of equidistantly acquired images on the accuracy of volume estimation in an ideal turntable setting. To do so, we calculate the volume V num of a sphere with radius r 0 derived as cut of tangent cones, cmp. Figures 6A,B,D: Each ideal camera is represented by its camera center C i , known from the selected working distance and rotation angle. A sphere projects to a circle on the sensor plane. A volume carving step for each image of this sphere thus corresponds to testing for each point of the working volume, if it lies inside or outside a cone spanned by C i and the outline of the projected sphere (transparent cones in Figure 6B). The cone is independent of the focal length f of the ideal camera, but depends on the working distance d, i.e., the distance between each camera center and the center of the sphere. We select r 0 = 1.5 mm, as we use a highly accurate, spherical bearing ball of this size as ground truth object in real experiments (see Section 4.4).
As we did not find a closed form solution for the volume of an object derived by a cut of N cones for arbitrary N, we numerically integrate the volume by a Monte-Carlo method: The smaller V reg can be selected, the more accurate V num can be approximated with a fixed number K of sampling points. We observe that the selected turntable camera configuration (Figures 6A,C) is symmetric with respect to • the plane spanned by the cone centers i.e., the equator of the sphere, • each plane spanned by the rotation axis and a camera center, • each plane spanned by the rotation axis and cutting the rotation angle between two adjacent camera centers in half, i.e., the vertical plane between two cameras.
Furthermore, we observe that • the inner of the sphere lies completely in the carved volume, i.e., we carve the sphere from the outside; and finally that • the sought-for volume lies in a concentric sphere with a somewhat larger radius than the carved sphere.
This allows to restrict numerical calculations to a region with known volume V reg being box interval in spherical coordinates (cmp. Figure 6E, yellow region), in order to benefit optimally from symmetries in the problem. We restrict the altitude angle to θ ∈ [0, π/2], azimuth to φ ∈ [0, π/(2N)], and the radius to r ∈ [r 0 , r 1 ]. Radius r 1 is calculated by (1) intersecting all cones with the plane spanned by the camera centers, yielding a pair of lines for each cone, (2) selecting the right lines of two adjacent cones (cmp. example red lines in Figures 6C,E) (3) calculating the intersection between these lines, and (4) selecting r 1 as radius of the point given in polar coordinates. This geometry is depicted in Figures 6C,E. The selection correspond to one section of a spherical shell, cut in half by the plane spanned by the cone centers (i.e., at the equator of the sphere) and cut in 2N sections by the half-planes starting at the axis of rotation symmetry (i.e., the axis through the poles of the sphere) and each including one cone center; as well as their angle bisector planes.
Randomly sampling points in spherical coordinates produces higher point densities toward the origin in Euclidean coordinates and toward the north-south-axis of the sphere. We correct for these density differences combining two approaches. To understand this we need to calculate the Jacobian of the spherical coordinate transformation from Euclidean coordinates. Using the convention we get the Jacobian J J = det ∂(x, y, z) ∂(r, θ, φ) = det   sin θ cos φ r cos θ cos φ −r sin θ sin φ sin θ sin φ r cos θ sin φ r sin θ cos φ cos θ −r sin θ 0   = r 2 sin θ .
As long as r 0 and r 1 do not differ too much, as in our case, it is sufficient to consider the radial part of J by weighing sampling points p by their radius value r p whereK in = p∈P in r 2 p andK = p∈P r 2 p . P is the set of all sampling points and P in ⊂ P the set of all inliers.
In Figure 7 relative error E = (V num − V 0 )/V 0 with V 0 = 4/3πr 3 0 is depicted vs. the number N of images used for reconstruction of V num . We observe that for parallel projection, using an even number of cameras (or images) yields the same result as using half the number of images. This makes sense, as for an even number of cameras, pairs of cameras are in an 180 • -configuration, looking at the same object contour from different sides. This does not add additional information to the reconstruction. However, when considering central projection also images from an 180 • -configuration add additional information, as cameras do not look at the same contour. Figure 8 shows volumes reconstructed using different camera configurations. We observe that using N = 5 or N = 10 camera positions yields identical results for parallel projection in Figure 8, left. However, for central projection N = 10 camera positions yields a much better reconstruction than using N = 5 images. This is in full agreement with the plot given in Figure 7, bottom right, where we also observe, that using odd N is not always better than using even N with one additional camera position. It depends on object size in relation to working distance, which camera position configuration yields better results in theory. We observe, that in the cases tested here, a minimum of N = 11 or N = 12 images is needed to stay below 1% relative error. A limit of less than 0.1% error is reached using N = 33, N = 36, or N = 37 images. In practice, however, given a sufficiently high N, other error sources may dominate (see next Sections).

Accuracy Loss Due to Position Inaccuracies
We performed experiments using synthetic images in order to test the different error sources in our method. The images showed a perfect sphere as projection of a synthetic ball of same size as our bearing-ball. Geometrical setup was as for the experiments in Section 4.2, with working distance d = 69.9 mm. We reconstructed the ball using volume carving, a voxel grid resolution of 256 3 and (12 µm) 3 voxel size. Results were very similar to the ones shown in Figure 7 for the same d, e.g., relative error at 36 images was approximately 0.14% instead of 0.1%. We do not show this plot. We conclude, that volume carving on a fine enough grid comes close to the theoretical performance of the simulation from Section 4.2, in agreement with our observations in Figure 5C.
Estimating TCP locations in the images is critical for reconstruction accuracy. To test the influence of a TCP wrongly located in an image, we add a relatively large offset of 7 pixel in y-direction to x TCP in the first image and leave the TCP locations in all other images untouched. The carving result is shown in Figure 9. The volume loss due to the misaligned TCP is in the order of 0.001 to 0.002 mm 3 , and thus reduces the positive volume error due to non-carved regions (as described in Section 4.2). For image configurations with relatively low positive initial error, i.e., here N > 36, the error due to the TCP offset dominates, such that overall relative error is negative. Please note that Figure 9 shows absolute values of the relative errors due to the log-scale.

Seed Types and Overall Accuracy
Using the proposed method we reconstructed different seed types, namely Arabidopsis (length, i.e., longest dimension 0.5 mm), rapeseed (≈ 2 mm), barley (≈ 8 mm), and maize (≈ 11 mm). See Figure 10 to get an impression of the usually achieved reconstruction accuracy. Absolute accuracy is validated experimentally using two different test objects. The first one is the ball-bearing ball we used for working distance calibration with 3.00 mm ±0.5 µm FIGURE 9 | Absolute value of relative error of volume vs. number N of camera positions in a synthetic volume carving experiment, where an offset of 7 pixel has been introduced in x TCP of the first image. Configuration is identical to the simulation results shown in Figure 7. Yellow and dark green points have been repeated from there. diameter, i.e., 0.02% diameter tolerance, and thus a precisely known volume of 14.137 mm 3 ±0.007 mm 3 , i.e., 0.05% volume tolerance. Clearly, as we used this object for calibration, scaling of the mask images exactly fit to the respective projection matrices P i . However, the overall performance of the system for volume reconstruction can still be evaluated using this object, as the volume derived has not been used for calibration and still accumulates all errors and imperfections the method has. The second object is an ink cartridge ball with 2.45 mm ±0.02 mm diameter, i.e., 0.4% diameter measurement error, measured with a digital sliding caliper, and thus a volume of 7.70 mm 3 ± 0.19 mm 3 , i.e., 2.5% volume error.
In Figure 11 volume error vs. number N ∈ {27, . . . , 36} of imaging positions is shown for the ball-bearing ball experiment. Comparing results to Figure 7, lower left (values repeated in Figure 11 for easy reference), we observe that negative reconstruction errors due to positioning inaccuracies are in the same range as the positive theoretical errors and increase with increasing number of images. Please note that the plot shows absolute values of relative errors, as negative errors can not be displayed in log-scale.
Reconstructing the ball using 36 images as used for seed reconstruction, a voxel grid resolution of 256 3 and (12 µm) 3 voxel size, i.e., a volume quite tightly surrounding the object,  yields a volume of 14.11 mm 3 and thus a mean diameter of 2.998 mm for the bearing ball. This is a relative error of −0.19% wrt. the specified volume, and of −0.06% wrt. the diameter, when calculating the diameter from the measured volume, assuming a perfect sphere. For the ink cartridge ball we measure 7.83 mm 3 corresponding to a mean diameter of 2.46 mm, being well within the measuring error of our caliper measurement.
We conclude that the overall accuracy of our method, including camera calibration error, mechanical imperfections, TCP finding error, imprecision due to the simple carving approach etc. is high enough to compete with or even beat a precise slide caliper for length measurements. Absolute values of measurement errors of volume and lengths are in the range of few per mill.

CONCLUSION AND OUTLOOK
Simple volume carving combined with a method for extrinsic camera pose estimation from images is sufficiently accurate for size measurements of even tiny seeds. To optimize our system for runtime and accuracy, we investigated its performance using different parameter settings. Surprisingly, the main performance gain potential does currently not lie in using more sophisticated reconstruction methods allowing for higher achievable voxel resolutions R, e.g., achievable by the highly accurate method presented by Klodt and Cremers (2015) and necessary for reconstruction of more complex surfaces. Our findings allow reducing preprocessing and transfer times by selecting a suitable image number N and comparably low voxel resolution of 256 3 .
The optimal number N of acquisition positions used in a turntable setting depends on the selected projection geometry. In our case using N = 36 images yields a theoretical overestimation of the reconstructed volume of a sphere by +0.1% relative error.
The method's achievable accuracy has been tested experimentally using a highly accurate spherical object. Systematic errors are much lower than we expected, between +0.06% for N = 27 and −0.19% for N = 36. This means that volume losses due to inaccurate positioning of the object are truly negligible for our purposes. Clearly, as seeds are not well represented by a ball, such accuracy studies give insight in the accuracy potential of the method-if it fails on a ball, it will also fail on more complex shapes.
Seed-shape-specific errors are not well captured by a ball and may vary from seed type to seed type. Alternative simple volume measurement methods for ground truthing, e.g., Archimedes' principle, are not accurate enough for such small objects, but high-resolution CT may be an option.
Many factors influence the accuracy, e.g., segmentation errors, small dust particles or camera pose errors. Most critical are inaccuracies of x TCP in the image, leading to parts of a seed being erroneously carved away. To detect such errors, suitable error estimation methods can be implemented to complement the method proposed here, e.g., summing back-projection error. When positioning accuracy is an issue, more elaborated, iterative, optimization-based but also more costly to calculate methods may be applied. However, as we have seen in our experiments, achieved accuracies are well high enough for seed phenotyping even without such corrections.
Overall we conclude that the presented method yields highly accurate seed reconstructions being accurate enough when interested in seed volume.

AUTHOR CONTRIBUTIONS
SJ and HS designed the robotic system used for seed handling and image acquisition. JR and AF implemented robot and acquisition hard and software. JR performed all lab experiments. FG performed all numerical experiments and implemented the supplemental Python code. HS implemented the baseline method, the supplemental Matlab code, and drafted the manuscript. All authors contributed to text and figures of the manuscript and approved the final manuscript.