Predicting clinical endpoints and visual changes with quality-weighted tissue-based renal histological features

Two common obstacles limiting the performance of data-driven algorithms in digital histopathology classification tasks are the lack of expert annotations and the narrow diversity of datasets. Multi-instance learning (MIL) can address the former challenge for the analysis of whole slide images (WSI), but performance is often inferior to full supervision. We show that the inclusion of weak annotations can significantly enhance the effectiveness of MIL while keeping the approach scalable. An analysis framework was developed to process periodic acid-Schiff (PAS) and Sirius Red (SR) slides of renal biopsies. The workflow segments tissues into coarse tissue classes. Handcrafted and deep features were extracted from these tissues and combined using a soft attention model to predict several slide-level labels: delayed graft function (DGF), acute tubular injury (ATI), and Remuzzi grade components. A tissue segmentation quality metric was also developed to reduce the adverse impact of poorly segmented instances. The soft attention model was trained using 5-fold cross-validation on a mixed dataset and tested on the QUOD dataset containing n=373 PAS and n=195 SR biopsies. The average ROC-AUC over different prediction tasks was found to be 0.598±0.011, significantly higher than using only ResNet50 (0.545±0.012), only handcrafted features (0.542±0.011), and the baseline (0.532±0.012) of state-of-the-art performance. In conjunction with soft attention, weighting tissues by segmentation quality has led to further improvement (AUC=0.618±0.010). Using an intuitive visualisation scheme, we show that our approach may also be used to support clinical decision making as it allows pinpointing individual tissues relevant to the predictions.

stages, where we had fewer annotations.At this stage, a subset of objects, either delineated by hand or segmented by a single UNet, has been reviewed by a renal pathologist.
This subset was originally picked by hand and assessed randomly by the pathologist.However, after assessing several dozen tissues, we narrowed down the subset further due to the pathologist's time constraint.From this point forward, the order of the tissues assessed was chosen to maximise the coverage of the tissues' Variational AutoEncoder embedding according to Sener et al. (Sener and Savarese, 2017).These tissues are not picked at random as the assessment was intended for a different task beyond the scope of this paper.Note that the way samples were picked might have slightly exaggerated the number of artefacts in the distribution.
A total of 1992 objects had been reviewed, 1032 of which were segmented by UNets and 960 were delineated by hand.These objects have been labelled or predicted as belonging to either tubule or glomeruli class.The statistics of tissues are shown in Tables S2 and S3.From Table S2 it can be seen that less   Figure S5: Comparison of AUC between different featuresets for predicting Remuzzi A. Entries are labelled "1" or "-1" according to whether the row performs better than the column by an AUC difference greater than σ 2 AU C1 + σ 2 AU C2 .σ is the uncertainty estimate of the mean as listed in Table 3. Entries where the difference is smaller than this threshold are labelled "Und".Here we can see an overall trend where tissue features outperform those from tile features.Here we can see an overall trend where AUC is higher when weighted.
than half (157/340) of the tissues labelled as "Tubules" were actually proximal tubules.The rest were objects irrelevant for assessment.However, since this is an ongoing project, the quality of the delineation might have improved over time.A qualitative estimate showed that approximately 70% of the delineated tubules are proximal in the up-to-date dataset.
A total of 731 proximal tubules have been reviewed by the pathologist if we include tissues that have been wrongly labelled as "glomeruli".These proximal tubules are graded for chronic (TA 0-5) and acute (ATI 0-5) damages.TA was graded according to the amount of thickening of the basement membrane: 0: absent; 1: mild thickening; 2: significant thickening but to an extent less than the thickness of epithelial cells; 3: thickening equal to the thickness of healthy epithelial cells; 4: thicker than healthy epithelial cells; 5: reserved for extreme cases.ATI was graded as follows: 0 -absent; 1 -segmental/local loss of brush borders/vacuolation of tubular epithelial cells; 2 -total loss of brush borders/vacuolation 3 -cell detachment/cellular casts; 4 -coagulation necrosis; 5 -reserved for extreme cases.The distribution of these grades, broken down by dataset, are shown in Figure S7.It can be seen that TA grades are heavily imbalanced.Between the 2 large datasets (QUOD and NMP), only 8 tissues have been given grade 1 and none have grades above 1.The distribution of ATI grades, on the other hand, is much more evenly spread.S4.
A value of m = 16 is used for our default UNets (identical to Tam et al. (2020)).
For the segmentation of cell nuclei, we use m = 1 for Block 6-8 and m = 8 for other blocks.For the segmentation of tissues at 1.76mpp, we use m = 8 for Block 6, 8 and m = 4 for Block 7. A smaller number of filters is used to save memory resources as a large receptive field is less relevant for the segmentation of cell nuclei and in the segmentation at low magnification.In addition to the foreground tissue classes, each UNet also outputs a background class and a boundary class.After obtaining instances using max-flow-min-cut, we expand the area of each instance by looping through each instance iteratively and dilating each mask with a 3x3 kernel until the instance exceeds the boundary or touches another instance.We find that the inclusion of boundary class helps separate tissues with ambiguous boundaries.
Ideally, we would want the soft values from the UNet ensemble to scale linearly with the probabilities for correct class prediction.However, this is not the case for data-limited tissue classes such as glomeruli.It can be seen that the optimal values are A = 0 for tubules (Figures S8ab) and A = 2 for glomeruli (Figure S8c-d).These results suggest that while outputs from the Bayesian network ensemble scale linearly with class probabilities when class labels are abundant, this linear relationship breaks down when uncertainties are data-limited, and some empirical corrections might be needed.
Note that Equation 3 serves to remove areas that are overconfident but would

S5. Native Biopsies
As the original datasets (QUOD and NMP) lack cases with moderate CKD, 12 cases of native biopsies were chosen to include patients with chronic changes.
Cases are given a qualitative description by the pathologist: 3 cases with no  IntelliSite scanner at x40 (0.25mpp).These slides were only used to train the segmentation part of the pipeline.

S6. Handcrafted Features
A list of handcrafted features is shown in Table S7.Table S8: Description of the featuresets presented in this study.Corresponds to Table 3 in the main text.

ResNet
Handcrafted combined with ResNet50 features pretrained with ImageNet

Figure S1 :
Figure S1: Distribution of grades given by a renal pathologist.Slides that do not contain enough glomeruli/vessels are not scored for that specific category.Figures S1a-S1d are standard Remuzzi Grades; Figure S1e assesses the overall acute damage in proximal tubules.

Figure S1 :
Figure S1: (cont.)Distribution of grades given by a renal pathologist.

Figure S2 :
Figure S2: Correlation between slide-level grades given by pathologist.Spearman r values are shown.Remuzzi TA and IF are highly correlated (r = 0.973).However, we have more slides graded for IF and not TA which are not shown in the heatmap.

Figure S3 :
FigureS3: Distribution of the number of glomeruli in our datasets.There is an inherent trade-off between obtaining biopsies size and the risk of complications such as bleeding.Some slides contain multiple adjacent sections of the same biopsy -we manually identified these slides and avoided double counting these instances.The majority (332) of slides do not have enough glomeruli for assessment as stipulated by Banff Criteria.

Figure S4 :
Figure S4: Distribution of the number of arteries in the QUOD dataset (PASstained slides only).There is some discrepancy between the artery count and those that have received a Remuzzi A grade.This is possibly because some of the slides have partially truncated arteries.

Figure S6 :
Figure S6: Comparison of Mean AUC between Featuresets Unweighted vs Weighted by Segmentation Quality.The labelling scheme is the same as Figure S5.

Figure S7 :
Figure S7: Distribution of Local Grades for Proximal Tubules

Figure
Figure S8 shows segmentation results on the QUOD-PAS slides.The plots show how the soft values of the combined segmentation scale with A, the multiplier of σ in Equation 3.For each value of A, we calculate a histogram binning all pixels predicted a certain value p by the UNet ensemble.The predicted probabilities p is plotted against the actual probabilities in (b) and (d) for different values of A. Then, we compute the L2 difference between the array p against the actual probabilities p as shown in (a) and (c).
not add to under-confident areas.Thus, the final number of tissues detected is likely to be underestimated.This bias is introduced to offset the asymmetrical consequences of false-positive detection compared to a false-negative: while a false-negative may simply result in fewer tissues being processed, a false-positive detection would lead to misleading information being introduced into the workflow.The former case is far easier to deal with as we can simply flag a slide as "Needs Review" if we detect too few tissues.
(a) L2 vs A for the tubule class.(b) Soft values for the tubules class.(c) L2 vs A for the glomerulus class.(d) Soft values for the glomerulus class.
(a) and (c) show the L2 distance between values predicted by the UNet ensemble and the actual probabilities a pixel belongs to the tubule/glomerulus class at different values of A. (b) and (d) show how the calibrated soft values compare with the actual probabilities a pixel belongs to a class.

Figure S9 :
Figure S9: Scatter plot showing how kidneys from older donors tend to be matched to older recipients.
Area of slide in mmˆ2 6 biopsy area: Total area of biopsy tissues in mmˆ2 7 max dist: Maximum value of distance transform of the tissue -(max(D)) 8 nuclei density: Nuclei density; # of nuclei / area of tissue 9 nuclei moments centre max dist: Maximum distance of nuclei measured from centre of tissue 10 nuclei moments centre mean dist: Mean distance of nuclei measured from centre of tissue 11 nuclei moments centre min dist: Minimum distance of nuclei measured from centre of tissue 12 nuclei moments centre norm max dist: Maximum distance of nuclei measured from centre of tissue, normalised by max(D) for each tissue 13 nuclei moments centre norm mean dist: Mean distance of nuclei measured from centre of tissue, normalised by max(D) for each tissue 14 nuclei moments centre norm min dist: Minimum distance of nuclei measured from centre of tissue, normalised by max(D) for each tissue Continued on next page Table S7 -continued from previous page # Name : Description n nuclei moments kurtosis: Kurtosis of nuclei distribution from edge 1 nuclei moments max dist: Maximum distance of nuclei measured from edge of tissue 1 nuclei moments mean dist: Mean distance of nuclei measured from edge of tissue 1 nuclei moments min dist: Minimum distance of nuclei measured from edge of tissue 1 nuclei moments norm max dist: Maximum distance of nuclei measured from edge of tissue, normalised by max(D) for each tissue 1 nuclei moments norm mean dist: Mean distance of nuclei measured from edge of tissue, normalised by max(D) for each tissue 1 nuclei moments norm min dist: Minimum distance of nuclei measured from edge of tissue, normalised by max(D) for each tissue 1 nuclei moments norm variance: Variance of nuclei distance from edge, normalised by max(D) for each tissue 1 nuclei moments skewness: Skewness of nuclei distance from edge 1 nuclei moments variance: Variance of nuclei distance from edge 1 nuclei nnuclei: Number of nuclei per tissue 1 nuclei nuclei area 050percentile: Area of nuclei 10 Continued on next page Table S7 -continued from previous page # Name : Description n nuclei nuclei col b 050percentile: Nuclei colour pixel values, blue channel 10 nuclei nuclei col g 050percentile: Nuclei colour pixel values, green channel 10 nuclei nuclei col r 050percentile: Nuclei colour pixel values, red channel 10 shape Ixx: sum(M x * M x) 1 shape Ixx norm: sum(M x * M x) / count(M) 1 shape Iyy: sum(M y * M y) 1 shape Iyy norm: sum(M y * M y) / count(M) 1 shape Izz: Moment of inertia of tissue -sum(M x * M x + M y * M y) 1 shape Izz norm: Moment of inertia of tissue, normalised by max(D).Larger value = more elongated -sum(M x * M x + M y * M y) / count(M) 1 shape aspect: Minor / Major Axis ratio 1 shape ax1: Major axis of tissue 1 shape ax1 norm: Major axis of tissue, normalised by max(D) 1 shape ax2: Minor axis of tissue 1 shape ax2 norm: Minor axis of tissue, normalised by max(D) 1 shape convex: Ratio of the tubule's mask over the convex hull of the mask 1 shape moment mask: np.sum(r1 * dist * mask) / np.sum(mask); r1 is radial distance from Centre of Mass of M skewness: Spatially-weighted (D) skewness of cytoplasm pixel values, as measured from edge of tissue, normalised by size of tissue 56 tissue moment variance: Spatially-weighted (D) variance of cytoplasm pixel values, as measured from edge of tissue, normalised by size of tissue 57 tissue moments centre mean dist: Mean distance of cytoplasm pixel values, as measured from centre of tissue -(255 -img 1d[:, 0]) * (max(D) -D) 58 tissue moments centre norm mean dist: Mean distance of cytoplasm pixel values, as measured from centre of tissue, normalised -(255 -img 1d[:, 0]) * (max(D) -D) / (max(D) 59 glom bm capsule area: Area of urinary space in glomerulus 60 glom bm capsule area ratio: (Area of Urinary Space) / (Area of Glomeruli) 61 ves lumen area: Lumen area in vessels 62 ves lumen ratio: Ratio of lumen to total area in vessels Total number of features 98

Table S3 :
Summary of Tissues Assessed by Pathologist.

Table S6 :
QUOD Donor Characteristics.n differs for each clinical variable as there are missing entries for some donors.
erate' chronic changes.Slides with inflammation, haemorrhage or potential drug effects are not present in these slides.All slides were scanned using a Philips

Table S7 :
List of handcrafted features extracted from tissues.