Pixel-based machine learning and image reconstitution for dot-ELISA pathogen serodiagnosis

Serological methods serve as a direct or indirect means of pathogen infection diagnosis in plant and animal species, including humans. Dot-ELISA (DE) is an inexpensive and sensitive, solid-state version of the microplate enzyme-linked immunosorbent assay, with a broad range of applications in epidemiology. Yet, its applicability is limited by uncertainties in the qualitative output of the assay due to overlapping dot colorations of positive and negative samples, stemming mainly from the inherent color discrimination thresholds of the human eye. Here, we report a novel approach for unambiguous DE output evaluation by applying machine learning-based pattern recognition of image pixels of the blot using an impartial predictive model rather than human judgment. Supervised machine learning was used to train a classifier algorithm through a built multivariate logistic regression model based on the RGB (“Red”, “Green”, “Blue”) pixel attributes of a scanned DE output of samples of known infection status to a model pathogen (Lettuce big-vein associated virus). Based on the trained and cross-validated algorithm, pixel probabilities of unknown samples could be predicted in scanned DE output images which would then be reconstituted by pixels having probabilities above a cutoff that may be selected at will to yield desirable false positive and false negative rates depending on the question at hand, thus allowing for proper dot classification of positive and negative samples and, hence, accurate diagnosis. Potential improvements and diagnostic applications of the proposed versatile method that translates unique pathogen antigens to the universal basic color language are discussed.


Introduction
The RGB model displayed the best performance characteristics for infection status 151 prediction 152 Among candidate models for infection status prediction, those with B, R and G as single predictors 153 were dismissed for having the highest Akaike information criterion (AIC) values (means + SEM = 154 29,592.0 ± 12.69, 21,554.0 ± 11.62 and 20,449.1 ± 11.42, respectively). The R+G combination also 155 had a similarly increased AIC value (19,548.1 ± 11.52), while the AIC of R+B was still elevated, but at 156 about a third of the R+G value (6,263.0 ± 9.42). Although the AIC value of the G+B pairwise 157 combination was further reduced (2,395.0 ± 9.40), it was still larger compared to that of the model 158 constructed using all predictors combined; R+G+B had the lowest AIC (2,382.2 ± 7.05) and was thus 159 selected for training the classifier algorithm. Using the selected model, the R, G, B variable 160 coefficients ± SEM were all found to be highly significant (p<10 -8 ) and estimated at β1= 0.08 ± 0.015, 161 β2= -0.72 ± 0.017, β3= 0.54 ± 0.014, respectively, implying a minor contribution of the R variable to 162 positive pixel predictability and a substantial but reciprocal effect of the G and B variables. The chosen model with all three-color predictors displayed the best performance indicators 176 at the 0.135 ± 0.0022 cutoff, which was the crossing point between sensitivity (0.9888 ± 0.00009) 177 and specificity (0.9882 ± 0.00009) across the entire cutoff range (Figure 6). At this point the two 178 variables differed the least (Difference = 0.0006, Table 2) and consequently, FP and FN rates were 179 at their nadir. Accuracy was also superior for the three-predictor model at this cutoff (0.9884 ± 180 0.00009) and inaccuracy was about 0.0116 ± 0.00009, which translates to about 230 faulty pixels 181 out of the 19593 pixels of the test subset. The excellent performance of the RGB model was also 182 evidenced by the constructed ROC curve (Figure 7), which revealed an Area Under the Curve (AUC) 183 value very closely approximating unity (0.9972 ± 0.00006). The optimal cutoff of 0.135 corresponds 184 to a 98.9% success rate of TP and TN (1-0.0112) predictions. 185 revealing truly positive samples with a very high probability of success. Positive controls produced 199 a robust signal throughout the range of examined cutoffs, while negative controls did not produce 200 a signal at any considered cutoff value, as was the case for the Buffer alone negative controls. 201 Table 3 shows the relative preponderance by increasing cutoff values of positive versus 202 negative pixels of unknown samples classified by the trained algorithm as well as the corresponding 203 dots calculated from pixels by accepting a mean value of 237 pixels per dot, or actually detected in 204 the reconstituted images of the DE output of unknown samples. As the cutoff increased, so did the 205 negative pixels and the corresponding dots, either calculated from the pixels or actually detected 206 (the two categories differed less than 3%, i.e., by a maximum of two dots out of the 72 of unknown 207 samples). The opposite trend was noted among positively detected pixels and the respective dots, 208 the numbers of both of which decreased as the cutoff increased. 209 positive signal at the 0.050 cutoff (E4) appeared negative at the 0.135 cutoff, which corresponds to 215 the lowest FP and FN rates (Figure 8-III). At the more elevated cutoff of 0.500 (Figure 8-IV), some 216 dots that appeared positive at the 0.135 cutoff were found to be negative (e.g. E4:F4), whereas 217 some others appeared sparsely populated by positive pixels (e.g. C8:D8, C10:D10, and E7:F7). 218

Unambiguous dot classification and infection status prediction of unknown samples
These findings are consistent with a trend towards increasing FN rates with higher cutoff 219 values as also suggested by the pixel vs. dot data presented in Table 3. Thus, at the 0.800 cutoff 220 more positive pixels were lost from the sparsely populated dots detected at the previous cutoff; 221 some of these dots appeared to be negative (e.g. C8:D8), whereas some others were still positive 222 (e.g. C3:D3, E8:F8) (Figure 8-V). At the highest cutoff examined (0.995), where sensitivity and FP 223 rates are expected to be even lower, dots yielding positive signals had the highest probability of 224 being TP (Figure 8-VI). Here, only the darkest dots maintained their compactness, whilst most 225 sparsely populated dots detected at the previous cutoff appeared negative (e.g. C3:D3 and E8:F8). 226 To test the versatility of the method, we examined an additional DE output loaded with 227 representative dilutions of Mirafiori lettuce big vein virus (MilBVV), a virus unrelated to the training 228 procedure. All TP and TN samples were correctly predicted as such in the reconstituted image of the 229 scanned DE output, corroborating the versatility of the method (Figure 9). ELISA kits, where manufacturers often claim comparable sensitivity and specificity to other 296 commercially available -and even licensed -ELISA kits, without elaborating further on the actual 297 diagnostic performance of the assays. 298 phytopathological practices, complementing the so-called molecular methods, exemplified by, but 300 not restricted to, hybridization and PCR, or the recently developed advanced sequencing methods 301 known as next (NGS) and third generation (TGS) sequencing. Although extremely useful as an ultra-302 sensitive and highly specific diagnostic tool, PCR, in its various formats, needs laborious optimization 303 and stringent conditions to avoid the risk of contamination, it is costly and time-consuming with 304 rather low throughput since nucleic acid isolation and possibly reverse transcription steps have to 305 precede it. The newly developed NGS and TGS, on the other hand, are highly informative for 306 unknown or candidate pathogens, but are still very expensive for diagnostic purposes and they tend 307 to yield results that are difficult to interpret, particularly in the field of etiological host-pathogen 308 interactions. Compared to these approaches, serological methods are cheaper, require less 309 sophisticated equipment and are better suited for large surveys for the detection of known 310 pathogens in established host-pathogen relationships. Serological monitoring is essential, in 311 particular, to the design and evaluation of effective vaccination programs. DE inherits all advantages 312 of the maternal ELISA application and overcomes most of its disadvantages pertaining to cost and 313 the need for specialized equipment, rendering it an attractive alternative for serodiagnosis in 314 epidemiologic research once the problem of output evaluation is resolved. 315 Besides fully addressing in an objective and reliable manner the issue of output evaluation, a 316 major advantage of our method is that it allows for cutoff selection, thus enabling decisions to be 317 made on FP and FN acceptance rates suited to the diagnostic question at hand; furthermore, 318 in the image (i.e., respective dots, but also including representative regions of the background) and 387 storing and processing the X-Y coordinates and RGB information of the pixels using the ImageJ 388 package (Schneider et al., 2012). To select the most appropriate predictors, the AIC value of different logit models was considered. 403 Since AIC provides an estimate of the relative amount of information lost by a given model, the less 404 information a model loses, the higher the quality of that model (Akaike, 1974; Hosmer et al., 2013). 405 Each model was constructed using a single predictor (i.e., R, G, or B), each of all possible pairwise 406 predictor combinations (i.e., R+G, R+B, or G+B), or all three predictors combined (i.e., R+G+B). In all 407 cases, the 'Dilution' category and variable interactions were excluded at this stage for simplicity. 408 Logistic regression was performed using the "glm" function in "R" (R Core Team, 2018). 409

410
Step 3: Validation of the trained algorithm 411 Training was performed using the best fit logistic model and the prototypic subset. To avoid data 412 overfitting (overtraining), an additional cross-validation step was undertaken using two random 413 subsets obtained from the prototypic DE dataset by employing the "caTools" library (Tuszynski,414 2018) in R (R Core Team, 2018): a training subset holding about 70% of pixels and a test subset 415 holding the rest 30% of pixels of the original (prototypic) image (Figure 2). For validation, predictions 416 on the "training" and "testing" subsets were made by the "predict" function in "R" based on the 417 logistic training model and the training subset. Confusion matrices were constructed at 0.01 418 intervals for each randomly created dataset and critical diagnostic parameters, such as the Cutoff, 419 Sensitivity, Specificity, Accuracy and Error, were calculated as follows: Sensitivity was estimated as 420 (FP+FN)/(TP+TN+FP+FN). The optimal cutoff was estimated as the probability at which Sensitivity 422 and Specificity differed the least. The above process was repeated 100 times using code written in 423 "R" (R Core Team, 2018). For validation purposes, the same diagnostic parameters of each random 424 dataset were obtained using the "performance" function of the "ROCR" library (Sing et al., 2005) in 425 "R" (R Core Team, 2018). 426 To obtain 95% confidence intervals (CI) for the mean of each variable, the corresponding 427 dataset containing a random sample (n=100) of each of the parameters was bootstrapped 10,000 428 times using the "boot" library (Davison and Hinkley, 1997) in "R" (R Core Team, 2018). To ensure 429 reproducibility and integrity of the used datasets, which could be influenced by stochasticity 430 introduced during the dataset splitting, each of the 100 splitting events was controlled by the R base 431 "set.seed" function, taking incrementally integer arguments from 1 to 100, with each integer 432 corresponding to one splitting event. 433

434
Step 4: ROC curve analysis and cutoff selection 435 The ROC curve of each dataset was constructed using the "ROCR" library (Sing et al., 2005) in "R". 436 Representative cutoffs and the corresponding Sensitivity and Specificity values were obtained from 437 constructed ROC curves. Optimum cutoff selection and corresponding variables were obtained by 438 finding the minimum absolute difference between "Sensitivity" and "Specificity" that coincided with 439 crossing point of the two variables. AUC was calculated at the selected cutoff of 0.135. 440

441
Step 5: Infection status prediction of unknown samples 23 To investigate the diagnostic performance of the method and its applicability to real-world 443 situations, we used lettuce plants grown in a field infested with the fungus Olpidium virulentus, a 444 known vector of LBVaV (Roggero et al., 2000). Sap from lettuce leaves was extracted (1g/10 ml) in 445 phosphate buffer (0.01M, pH=7.4) and centrifuged at low speed (10,000g) for ten minutes. The 446 supernatant was collected and loaded onto nitrocellulose membranes, which were processed with 447 LBVaV-specific, AP-conjugated antibodies and NBT/BCIP substrate as described previously. To 448 further examine the detection performance of the method for an antigen other than that on which 449 the training model was based, we followed the described procedure for a DE blot loaded with

24
To further examine the association between predicted pixels' classification and dot infection status, 465 we manually collected all X-Y coordinates and RGB information of the dots of the unknown samples 466 using ImageJ (Schneider et al., 2012). Subsequently, we made predictions using the trained 467 algorithm and calculated the total positively and negatively classified pixels across the range of 468    Figure 1. Overview of the proposed machine learning approach for DE output evaluation that harnesses pixel information for dot classification and diagnosis in merely a few hundreds of seconds after running the blot using standard methods. The DE output is first scanned (at a resolution of 150 dpi) and the prototypic image is converted to a matrix, holding pixel position and color information for all pixels contained in the dots (1). The matrix is then randomly tessellated into a training and testing dataset at a 70:30 ratio. Next, training of the classifier algorithm (2) is supervised by a logistic model of choice which, following validation (3), is used for infection status predictions on the testing subset. ROC curve analysis allows for selection of an appropriate cutoff (4). The procedure of scanning the blot and harnessing its pixel information is repeated on a DE of unknown samples (5). Pixel probabilities of the unknown samples are predicted by the trained classifier and a new image of the blot, with dots consisting of pixels with probabilities above the selected cutoff, is reconstituted, allowing for proper dot classification and, hence, unambiguous diagnosis.