Using a structural root system model to evaluate and improve the accuracy of root image analysis pipelines

Root system analysis is a complex task, often performed with fully automated image analysis pipelines. However, the outcome is rarely verified by ground-truth data, which might lead to underestimated biases. We have used a root model, ArchiSimple, to create a large and diverse library of ground-truth root system images (10,000). For each image, three levels of noise were created. This library was used to evaluate the accuracy and usefulness of several image descriptors classically used in root image analysis softwares. Our analysis highlighted that the accuracy of the different traits is strongly dependent on the quality of the images and the type, size and complexity of the root systems analysed. Our study also demonstrated that machine learning algorithms can be trained on a synthetic library to improve the estimation of several root system traits. Overall, our analysis is a call to caution when using automatic root image analysis tools. If a thorough calibration is not performed on the dataset of interest, unexpected errors might arise, especially for large and complex root images. To facilitate such calibration, both the image library and the different codes used in the study have been made available to the community.

Our analysis highlighted that the accuracy of the different traits is strongly dependent on the quality 23 of the images and the type, size and complexity of the root systems analysed. Our study also 24 demonstrated that machine learning algorithms can be trained on a synthetic library to improve the 25 estimation of several root system traits. 26 Overall, our analysis is a call to caution when using automatic root image analysis tools. If a 27 thorough calibration is not performed on the dataset of interest, unexpected errors might arise, 28 especially for large and complex root images. To facilitate such calibration, both the image library 29 and the different codes used in the study have been made available to the community. 30

1
Introduction 32 Roots are of utmost importance in the life of plants and hence selection on root systems represents 33 great promise for improving crop tolerance to biotic and abiotic stresses (as reviewed in (Koevoets et 34 al., 2016). As such, their quantification is a challenge in many research projects. This quantification 35 is usually twofold. The first step consists in acquiring images of the root system, either using classic 36 imaging techniques (CCD cameras) or more specialized ones (microCT, X-Ray, fluorescence, ...). 37 The next step is to analyse the pictures to extract meaningful descriptors of the root system. 38 To paraphrase the famous Belgian surrealist painter, René Magritte: "figure 1A is not a root system". 39 Figure 1A is an image of a root system and that distinction is important. an image is indeed a two-40 dimensional representation of an object, which is usually three-dimensional. Nowadays, 41 measurements are generally not performed on the root systems themselves, but on the images, and 42 this raises some issues. 43 44 45 Image analysis is the acquisition of traits (or descriptors) describing the objects contained in a 46 particular image. In a perfect situation, these descriptors would accurately represent the biological 47 object of the image with negligible deviation from the biological truth (or data). However, in many 48 cases, artefacts might be present in the images so that the representation of the biological object is not 49 accurate anymore. These artefacts might be due to the conditions under which the images were taken 50 or to the object itself. Mature root systems, for instance, are complex branched structures, composed 51 of thousands of overlapping ( fig. 1B) and crossing segments (fig. 1C). These features are likely to 52 impede image analysis and create a gap between the descriptors and the data. 53 Root image descriptors can be separated into two main categories: morphological and geometrical 54 descriptors. Morphological descriptors refer to the shape of the different root segments forming the 55 root system (table 1). They include, among others, the length and diameter of the different roots. For 56 complex root system images, morphological descriptors are difficult to obtain and are prone to error 57 as mentioned above. Geometrical descriptors give the position of the different root segments in 58 space. They summarize the shape of the root system as a whole. The simplest geometrical descriptors 59 are the width and depth of the root system. Since these descriptors are mostly defined by the external 60 envelope of the root system, crossing and overlapping segments have little impact on their estimation 61 and hence they can be considered as relatively errorless. Geometrical descriptors are expected to be 62 loosely linked to the actual root system topology, since identical shapes could be obtained from 63 different root systems (the opposite is true as well usually not on ground-truth images, but in comparison with previously published tools (measurement 72 of X with tool A compared with the same measurement with tool B). This might seem reasonable 73 approach regarding the scarcity of ground-truth images of large root systems. However, the inherent 74 limitations of these tools, such as scale or root system type (fibrous-vs. tap-roots) are often not 75 known. Users might not even be aware that such limitations exist and apply the provided algorithm 76 without further validation on their own images. This can lead to unexpected errors in the final 77 measurements. 78 One strategy to address the lack of in-depth validation of image analysis pipelines would be to use 79 synthetic images generated by structural root models (models designed to recreate the physical 80 structure and shape of root systems). Many structural root models have been developed, either to 81 model specific plant species (Pagès et al., 1989), or to be generic (Pagès et al., 2004;. These 82 models have been repeatedly shown to faithfully represent the root system structure (Pagès and 83 Pellerin, 1996). In addition, they can provide the ground-truth data for each synthetic root system 84 generated, independently of its complexity. However, they have not been used for validation of

Creation of a root system library 101
We used the model ArchiSimple, which was shown to allow the generation of a large diversity of 102 root systems with a minimal amount of parameters (Pagès et al., 2013). In order to produce a large 103 library of root systems, we ran the model 10,000 times, each time with a random set of parameters 104 ( fig. 2A). For each simulation, the growth and development of the root system were constrained in 105 two dimensions. 106 The simulations were divided into two main groups: fibrous and tap-rooted. For the fibrous 107 simulations, the model generated a random number of root axes and secondary (radial) growth was 108 disabled. For tap-root simulations, only one root axis was produced and secondary growth was 109 enabled (the extent of which was determined by a random parameter). 110 The root system created in each simulation was stored in a Root System Markup Language (RSML) 111 file. Each RSML file was then read by the RSML Reader plugin from ImageJ to extract ground-truth 112 data for the library . These ground-truth data included geometrical and 113 morphological parameters (table 1). For each RSML data file, the RSML Reader plugin also created 114 three JPEG images (at a resolution of 300 DPI) for each root system, with different levels of noise 115 (using the Salt and Pepper Filter in ImageJ) ( fig. 2.D). For each root system, we computed 116 overlapping index as the number of root segments having an overlap with other root segments over 117 the total number of root segments. 118

Root image analysis 119
Each generated image was analysed using a custom-made ImageJ plugin, Root Image Analysis-J (or 120 RIA-J). For each image, we extracted a set of classical root image descriptors, such as the total root 121 length, the projected area and the number of visible root tips ( fig. 2E). In addition, we included shape 122 descriptors such as the convex-hull area or the exploration ratio (see Supplemental file 1 for details of 123 RIA-J). The list of traits and algorithms used by our pipeline is listed in table 2. 124 126 127

Data analysis 128
Data analysis was performed in R (R Core Team). Plots were created using ggplot2 (Wickham, 2009) 129 and lattice (Sarkar, 2008). 130 The Mean Relative Errors (MRE) were estimated using the equation: 131 where is the number of observations, is the ground-truth and is the estimated ground-truth. 133

Random Forest Framework 135
A random forest is a state-of-the-art machine learning algorithm typically used for making new 136 predictions (in both classification and regression tasks). Random Forests can perform non-linear  137 predictions and, thus, those often outperform linear models. Since its introduction by Breiman in 138 2001 (Breiman, 2001), those have been widely used in many fields from gene regulatory network 139 inference to generic image classification ( relies on growing a multitude of decision trees, a prediction algorithm that has shown good 141 performances by itself but, when combined with other decision trees (hence the name forest), returns 142 predictions that are much more robust to outliers and noisy data (see bootstrap aggregating, Breiman 143 1996). 144 In a machine learning setting one is given a set = {( * , * ) , ( 4 , 4 ) , . . . , ( ) , ) )}, 145 is an element of a −dimensional feature space X, 146 The learning task is to find a model 148 : → 149 that predicts the data in a good way, where goodness is measured w.r.t. an error function . 150 A decision tree A is a machine learning method that, for a dataset , constructs a binary tree with 151 each node representing a binary question and each leaf a value of the response space. In other words, 152 a prediction can be made from an input value by looking at the set of binary questions that leads to a 153 leaf (e.g. is the primary root bigger than q1 and if yes is the number of secondary roots smaller than 154 q2 and if no, …) 155 Each decision is based upon exactly one feature and is used for deciding which branch of the tree a 156 given input value must take. Hence a decision tree splits successively the set into smaller subsets 157 and assigns them a value ' = A ( ' ) of the response space. 158 The choice of the feature used for splitting depends on a relevance criterion. In our setting, the 159 default relevance criterion from the randomForest R package (CRAN randomForest, 2015), namely 160 the Gini index, has been used. 161 A random forest 162 consists of decision trees A,C , where several key parameters such as the feature space, are chosen 164 randomly (hence the word Random in the algorithm name). While using a random subspace strongly 165 accelerates the growth of a single tree, it can also decrease its accuracy. However, the use of large 166 number of trees counterbalance advantageously those two effects. The final prediction for each input 167 value ' corresponds to the majority vote of all the decision trees of the forest A,C ( ' ) in a classification 168 setting while an average of all predicted values is used in a regression task. 169 Results and discussions 211

Production of a large library of ground-truth root system images 212
We combined existing tools into a single pipeline to produce a large library of ground-truth root 213 system images. The pipeline combines a root model (ArchiSimple (Pagès et al., 2013)), the Root 214 System Markup Language (RSML) and the RSML Reader plugin from ImageJ . 215 In short, ArchiSimple was used to create a large number of root systems, based on random input 216 parameter sets. Each output was stored as an RSML file ( fig. 2A), which was then used by the RSML 217 Reader plugin to create a graphical representation of the root system (as a .jpeg file) and a ground-218 truth dataset ( fig. 2B). Details about the different steps are presented in the Materials and Methods 219 section. 220 We used the pipeline to create a library of 10,000 root system images, separated into fibrous 221 (multiple first order roots and no secondary growth) and tap-root systems (one first order root and 222 secondary growth). The ranges of the different ground-truth data are shown in table 3 and their  223 distribution is shown in the Supplemental Figure 1. 224 We started by evaluating whether fibrous and tap-root systems should be separated during the 225 analysis. We performed a Principal Component Analysis on the ground-truth dataset to reduce its 226 dimensionality and assess if the type grouping influenced the overall dataset structure ( fig. 3A). 227 Fibrous and tap-root systems formed distinct groups (MANOVA p-value < 0.001), with limited 228 overlap. The first principal component, which represented 30.9% of the variation within the dataset, 229 was mostly influenced by the number of primary axes. The second principal component (19.1% of 230 the variation) was influenced, in part, by the root diameters. These two effects were consistent with 231 the clear root system type grouping, since they expressed the main difference between the two groups 232 of root-system types. Therefore, since the type grouping had such a strong effect on the overall 233 structure, we decided to separate them for the following analyses. 234 235

Systematic evaluation of root image descriptors 237
To demonstrate the utility of a synthetic library of ground-truth root systems, we analysed every 238 image of the library using a custom-built root image analysis tool, RIA-J. We decided to do so since 239 our purpose was to test the usefulness of the synthetic analysis and not to assess the accuracy of 240 existing tools. Nonetheless, RIA-J was designed using known and published algorithms, often used in 241 root system quantification. A detailed description of RIA-J can be found in the Materials and 242 Methods section and Supplemental File 1. 243 We extracted 10 descriptors from each root system image (Table 2) and compared them with their 244 own ground-truth data. For each pair of descriptor-data, we performed a linear regression and 245 computed its r-squared value. Figure 4 shows the results from the different combinations for both 246 root system types. We can observe that, generally, correlations were poor with only 3% of the 247 combinations having an r-squared above 0.8. In addition, for some ground-truth data, such as the 248 mean lateral length or the number of primary roots, none of the descriptors actually gave a good 249 estimation (fig 4, highlighted with arrows). 250

252
Additionally, it should be noted that the correlations were different for fibrous-and tap-root systems. 253 As an example, the correlation found between the mean_lat_diameter and diam_mean estimators was 254 better for fibrous roots than within the tap-root dataset. Consequently, validation of the different 255 image analysis algorithms should be performed, at least, for each group. An algorithm giving good 256 results for a fibrous root system might fail when applied to tap-rooted ones. 257

Errors from image descriptors are likely to be non-linear across root system sizes and 259
image qualities 260 In addition to being related to the species of study, estimation errors are likely to increase with the 261 root system size. As the root system grows and develops, the number of crossing and overlapping 262 segments increases ( fig. 5A), making the subsequent image analysis potentially more difficult and 263 prone to error. However, a systematic analysis of such error is seldom performed. 264 Figure 5 shows the relationship between the ground-truth and descriptor values for three parameters: 265 the total root length ( fig. 5B), the number of roots ( fig. 5C) and the root system depth ( fig. 5D). For 266 each of these variables, we quantified the Mean Relative Error (see Materials and Methods for 267 details) as a function of the overlap index. This was done for three levels of noise added to the 268 images ("null", "medium" and "high"). We can observe that for the estimation of both the total root 269 length and the number of lateral roots, the Mean Relative Error increased with the size of the root 270 system ( fig. 5B-C). As stated above, such increase of the error was somehow expected with 271 increasing complexity. Moreover, depending on the metric of interest, such as the number of root 272 tips, low image quality can result in high level of error. For other traits, such as the root system 273 depth, no errors were expected (depth is supposedly an error-less variable) and the Mean Relative 274 Error was close to 0 whatever the size of the root system and image quality. 275 The results presented here are tightly dependent on the specific algorithms used for image analysis 276 and hence might be different for other published tools. However, they are a call for caution when 277 analysing root images : unexpected errors in ground-truth estimation can arise. Our image library can 278 be used to better identify the errors generated by other analysis tools, current or future. 279

281
The main advantage of creating a synthetic library is to generate paired datasets of image descriptors and their 282 corresponding ground-truth values. Having both information can, in theory, be used to either calibrate the 283 image analysis pipeline or to identify the best descriptors for the ground-truth traits of interest. Here, we 284 explored the second approach and used a random forest algorithm to find which combination of descriptors 285 would best describe each ground-truth data (see Material and Methods for details). In short, we randomly 286 divided the whole dataset into training (3/4) and testing subsets (1/4). The training set was used to create a 287 random forest model for each ground-truth data, which was then we applied to the test set. The accuracy of 288 these new predictions was then compared to the accuracy of the direct method (single descriptors) ( fig. 2C). 289 Figure 6 shows the comparison of the accuracy (both the r-squared values from linear regressions and the 290 Mean Relative Error, MRE) of both methods for each ground-truth data. We can clearly see that the random 291 forest approach performed always better (sometimes substantially) than the direct approach, even for images 292 with high level of noise. In addition, for most traits, the r-squared and MRE values were above 0.9 and below 293 0.1 respectively, which is very good, especially for such a wide range of images. In addition, the random forest 294 approach allowed the correct estimation of traits that were difficult to estimate with the direct approach (such 295 as the number of primary axes or the mean lateral root density). 296 Figure 7 shows the detailed comparison of both methods for the estimation of the total root length. Again, a 298 clear improvement was visible with the Random Forest method, leading to small errors, even with large root 299 systems and noisy images.

300
In our study, machine learning algorithms on simulated datasets seems to yield very good results and we 301 believe they open new avenues for root system analyses. It is clear however that their value relies on the 302 quality and relevance of the training dataset vs. the test dataset and that they must be carefully designed.

4
Conclusions 305 The automated analysis of root system images is routinely performed in many research projects. Here 306 we used a library of 10;000 synthetic images to estimate the accuracy and usefulness of different 307 image descriptors extracted with a homemade root image analysis pipeline. Our study highlighted 308 some limitations and biases of the image analysis process. 309 We found that the type of root system (fibrous vs tap-rooted), its size and complexity, as well as the 310 quality of the images had a strong influence on the accuracy of some commonly used image 311 descriptors and their meaning and relevance for ground-truth extraction. So far, a large proportion of 312 the root research has been focused on seedlings with small root systems and has de facto avoided 313 such errors. 314 However, as the research questions are likely to focus more on mature root systems in the future, 315 these limitations will become critical. We showed that synthetic datasets can be used for calibration 316 or modelling (machine learning) steps that allow ground-truth extraction from comparable images. 317 We then hope that our library will be helpful for the root research community to evaluate and 318 improve other image analysis pipelines. 319

5
Conflict of Interest 321 The authors declare that the research was conducted in the absence of any commercial or financial 322 relationships that could be construed as a potential conflict of interest. 323 ground-truth data that cannot be accurately described with the different descriptors. The arrows were 357 doubled when it was the case for both fibrous and tap-rooted root systems. 358 Root system depth 365