Ultrasound-Based Scoring System for Indication of Pyeloplasty in Patients With UPJO-Like Hydronephrosis

Background: Previous scoring systems have used renal scan parameters to assess severity of ureteropelvic junction obstruction-like hydronephrosis (UPJO-like HN), however this information is not always reliable due to protocol variation across centers and renogram limitations. Therefore, we sought to evaluate the Pyeloplasty Prediction Score (PPS), which utilizes only baseline ultrasound measurements to predict the likelihood of pyeloplasty in infants with UPJO-like. Methods: PPS was developed using three ultrasound parameters, Society of Fetal Urology (SFU) grade, transverse anteroposterior (APD), and the absolute percentage difference of ipsilateral and contralateral renal lengths at baseline. PPS was evaluated using prospectively collected prenatal hydronephrosis data (n = 928) of patients with UPJO-HN. Children with vesicoureteral reflux. primary megaureter, other associated anomalies, bilateral HN and <3 months of follow-up were excluded. Scores were analyzed regarding its usefulness in predicting which patients would be more likely to undergo pyeloplasty. Sensitivity, specificity, likelihood ratios (LR) and receiver operating characteristic (ROC) curve were determined. Results: Of 353 patients, 275 (78%) were male, 268 (76%) had left UPJO-like HN, and 81 (23%) had a pyeloplasty. The median age at baseline was 3 months (IQR 1–5). The PPS system was highly accurate in distinguishing patients who underwent pyeloplasty using baseline ultrasound measurements (AUC: 0.902). PPS of 7 and 8 were found to have a sensitivity of 85 and 78%, and specificity of 81 and 90%, respectively. PPS of 8 was associated with a LR of 7.8, indicating that these patients were eight times more likely to undergo pyeloplasty. Conclusion: Overall, PPS could detect patients more likely to undergo pyeloplasty using baseline ultrasound measurements. Those with a PPS of eight or higher were eight times more likely to undergo pyeloplasty.


INTRODUCTION
Prenatal hydronephrosis is one of the most commonly detected ultrasound findings, affecting 1-5% of pregnancies, and is usually detected during the third trimester as an incidental finding (1). Ureteropelvic junction obstruction-like hydronephrosis (UPJOlike HN) is one of the most common congenital causes of prenatal hydronephrosis (1). If left untreated, the severe hydronephrosis (HN) due to obstruction can lead to a clinical symptoms such as urinary tract infections, hematuria, progressive deterioration of renal function, and permanent kidney damage (1)(2)(3). Thus, early detection and surgical intervention of UPJO cases provides benefits by reducing the length of time the kidney is obstructed. However, a large proportion of UPJO-like [isolated hydronephrosis (pelvic distension) with or without dilated calyces] cases are benign in nature and spontaneously resolve. Therefore, the challenge with UPJO-like patients is identifying those that warrant further testing and would benefit from intervention in a timely manner to reduce associated morbidities.
Scoring systems have been developed to be utilized as an adjunctive tool to help predict those patients in need of pyeloplasty. Many of these scoring systems rely on ultrasound measurements of the afflicted kidney and on diuretic renogram findings. Nevertheless, nuclear studies pose an issue to the external validity of these scoring systems, as their protocols vary significantly across different centers (4). Consequently, interpretation of the drainage patterns, renogram curves and T1/2 times can be subjective and unreliable.
The primary objective of this study was to create a scoring system, the pyeloplasty prediction score (PPS), based on baseline ultrasound findings only, and evaluate its utility in predicting pyeloplasty in infants with UPJO-like. We hypothesized that the proposed scoring system could discriminate those who will resolve spontaneously from those who will end up having surgical intervention.

MATERIALS AND METHODS
After obtaining Research Ethics Board approval (13-62D), we reviewed our prospectively collected prenatal hydronephrosis database (n = 928) from a tertiary pediatric hospital and identified those who were diagnosed with UPJO-like hydronephrosis between 2008 and 2019. We excluded infants with vesicoureteral reflux, primary megaureter (hydroureteronephrosis), duplication anomalies, bilateral cases, other genitourinary anomalies (Prune-Belly, posterior urethral valves, horseshoe kidneys, neurogenic bladder, multicystic dysplastic kidney), and those with <3 months of follow-up.

Calculation of the Pyeloplasty Prediction Score
For each case, characteristics were collected, and only baseline (initial visit) ultrasound measurements were analyzed. The PPS scoring system was then retrospectively applied to each included case. Ultrasound measurements were conducted following institutional protocol, such as no pretest patient hydration to minimize measurement bias, with the patient in supine position, measurement pre-and post-void to confirm an empty bladder, using the same two ultrasound machines, by two technicians specialized in pediatric renal bladder ultrasound, who were specifically assigned to the Pediatric Urology Service.
The clinical outcome was resolution of HN or surgical intervention with a pyeloplasty. HN resolution was defined as two consecutive ultrasounds showing either Society for Fetal Urology (SFU) grade 1 or less, or renal pelvis anteroposterior diameter (APD) of 10 mm or less (5). Indications for pyeloplasty were based on the following protocol: 1-Worsening of hydronephrosis, characterized by increase in the transverse APD of the renal pelvis with or without change (increase) of SFU grade on repeat ultrasounds; or 2-Deterioration of differential renal function (DRF) >10% on repeated renal scans; or 3-Initial renal function <40% associated with an obstructive (ascending) curve on renogram; or 4-Worsening of hydronephrosis associated with a T1/2 time >30 min, or 5-Development of symptoms (sepsis, febrile urinary tract infections, stones).
PPS was based on three widely used ultrasound variables at baseline: SFU grade, transverse APD, and absolute percentage difference in renal length. APD was calculated in the transverse view, by measuring the distance between the parenchymal lips at the renal hilum in the mid-section. The extra-renal and intrarenal measurements for APD were taken and the larger of the two was recorded in the prospective database to be used in the PPS calculation. Renal length was measured in ipsilateral and contralateral kidneys in the longitudinal view, such that the distance between the most distant points of the upper and lower poles was captured. Figures 1A,B demonstrate the technique for measuring renal length and APD using electronic calipers, respectively. Each of these variables were assigned a value out of four, with zero being normal variant or least severe and four being the most severe; thus, making the PPS score total range from 0 to 12. The SFU grading system ranges from normal, 1, 2, 3, 4 which corresponded to a score of 0, 1, 2, 3, 4, respectively (6).
The APD measurement was grouped as <5, 5-10, 11-15, 16-19, ≥20 mm corresponding to scores of 0, 1, 2, 3, 4, respectively. The APD category values were established based on current evidence that generally, the larger the APD, the greater the risk of obstructive uropathy (7)(8)(9), and thus a greater likelihood of surgical intervention (8,(10)(11)(12). An APD <5 mm is not considered as HN, which is why a score of 0 was assigned. A post-natal APD value <10 mm is considered as physiologic HN and APD values from 10 to 15 mm are associated with low risk of obstructive uropathy, which both correspond to the Urinary Tract Dilation (UTD) Classification System's P1 designation (13). The P1 designation is the lowest risk stratum in the UTD post-natal classification, which is why those two APD categories been assigned lower severity scores as 1 and 2, respectively. An APD value of 16 or greater was found by Dias et al. to have the best diagnostic odds ratio to identify infants who had pyeloplasty performed, which corresponds to the category of 16-19 mm (10). Multiple literature sources vary in terms of what APD cutoff value has the greatest likelihood of surgery, but are generally consistent in that an APD of at least 20 mm or greater is associated with the highest likelihood of pyeloplasty, which is why it was designated the greatest score of 4 (14,15).
Absolute percentage difference between renal lengths was grouped as <5% (error variation), 5-10, 11-15, 16-19, ≥20% corresponding to scores of 0, 1, 2, 3, 4, respectively. The absolute percentage difference was taken by the following equation: The three scores were then summed for a total score out of 12. The details of scoring criteria are described in Table 1.
The PPS system was analyzed for its usefulness in predicting which patients were more likely to undergo pyeloplasty. A trial of various cut-points was done to establish an optimal threshold that would maximize sensitivity and specificity of the scoring system. Sensitivity, specificity, likelihood ratios (LR) with their corresponding 95% confidence intervals (CI) and a receiveroperating characteristic (ROC) curve were determined. A p-value equal to or <0.05 was considered statistically significant. SPSS version 26 (www.ibm.com) were used for analysis.

RESULTS
Overall, from 928 prenatal HN patients in our database, a total of 353 with UPJO-like (isolated HN) qualified for analysis based on inclusion and exclusion criteria. Of the 353 included infants, 275 (78%) were male and 268 had HN on the left side (76%). The median age of the cohort at baseline (initial visit) was 3 months (IQR 1-5 months). In 81 of the 353 patients (23%), kidneys were considered obstructed based on our criteria (previously stated in the methods section), and a pyeloplasty was performed.
The area under the ROC curve (AUC) was 0.902, demonstrating the accuracy of the PPS score in identifying patients more likely to undergo a pyeloplasty (Figure 2A). The PPS could result in a score of 1 to 12, through testing of various modeling scenarios, a score of 7-8 was found to be the optimal cut-off point, with the highest levels of sensitivity and specificity for discriminating patients that would likely be candidates for a pyeloplasty. The sensitivities of a PPS score of 7 and 8 were found to be 85 and 78%, respectively ( Figure 2B). The specificities of a PPS score of 7 and 8 were found to be 81 and 90%, respectively ( Figure 2B).
The LRs of the PPS score range (1-12) increased progressively as the score increased, as expected. The optimal cut-point score of 8 was found to have a LR of 7.8 (Figure 3). Based on LR values, we stratified the patients into three risk categories, according to the likelihood of undergoing pyeloplasty (Figure 3).

DISCUSSION
The present study involved the development and analysis of a prediction scoring system for pyeloplasty in UPJO-like HN using only baseline ultrasound characteristics. Our findings show that PPS was highly accurate in distinguishing patients who ended up having a pyeloplasty from those managed non-surgically. Based on our findings, the optimal cut-off point where pediatric urologists could consider indicating a pyeloplasty should be a PPS ≥8, provided they followed the same pyeloplasty indications, as outline in our protocol. Only two patients were recorded as false negatives, such that the PPS score was below eight at baseline, yet eventually had pyeloplasty. These two cases initially presented to office with very mild hydronephrosis but over repeated follow-up found worsening of the condition. As previously described, one of the indications for pyeloplasty at our institution is worsening of hydronephrosis by APD or SFU grade, which is why these two Each parameter is assigned a score from 0 to 4, 0 being least severe and 4 being most.
patients qualified for surgical intervention. With respect to false positives, there were no patients that scored above 8 and did not have surgical intervention. Based on the sensitivity and specificity calculations, clinicians can expect a 90% probability that those with a score ≥ 8 will end up having a pyeloplasty in the future. The LR indicated that patients with a PPS ≥ 8 were eight times more likely to have surgery vs. no surgery.  (15). Pyeloplasty indications have been well-established in the main urological textbooks. According to Campbell-Walsh 12th edition textbook, widely accepted indications for pyeloplasty include "increasing APD on ultrasonography, low or decreasing DRF, breakthrough infections while on prophylactic antibiotics, or symptomatic hydronephrosis in older infants and children" (16).

Indications for Pyeloplasty
Nevertheless, controversy surrounding some of these indications due to inherent subjectivity still exists. Low or decreasing differential renal function does not specify an actual value for decreased function or decreasing function, thus how low or how much has decreased to indicate need for pyeloplasty is subjective to some degree. Some authors may consider < 40% DRF as a cut-off (17) while others may push it even lower to <35% (18). Similarly, this subjectivity issue arises with increase in the APD of renal pelvis. At what APD value and at what rate of increase does pyeloplasty outweigh non-surgical management? Again, these values vary from surgeon-to-surgeon and are the subject of many debates within pediatric urology.
Historically, decreased or decreasing renal function as an indication for pyeloplasty had been controversial (17). Waiting until function has dropped and then performing surgery with the hopes to regain what has already been lost seems to be contradictory to the philosophy in pediatrics of maximizing a child's potential (19). While this view does not convey the thought that surgery should be performed on every child, this does highlight the need for a more advanced measure for screening patients that would benefit significantly from early surgical intervention rather than observation.

Early Intervention Compared to Non-surgical Management
Argument against early intervention of UPJO-like HN consists of evidence demonstrating that most cases of UPJO-like HN are clinically benign and will self-resolve. Koff followed neonates with suspected UPJO-like HN (regardless of degree of HN, shape of diuretic renogram curve, or initial degree of functional impairment) and showed that only 7% eventually had pyeloplasty performed for obstruction, suggesting that due to diagnostic inaccuracy and low risk of developing obstructive injury, many newborn kidneys with HN may rapidly improve without intervention (20,21). This was further validated by Onen et al. who followed 19 newborns (38 kidneys) with primary SFU grade 3 to 4 bilateral HN for a mean of 54 months. Overall, 25 hydronephrotic kidneys (65%) resolved spontaneously, with renal dilatation and function improving over time in most kidneys (22). Furthermore, Braga et al. analyzed a cohort of 501 UPJO-like HN patients with all SFU grades and observed that 68% of those with grades 3 and 4 HN resolved with nonsurgical management over 48 months of follow-up (23). This rate compares well to a recent study from a center known for its conservative approach regarding pyeloplasty indications. They reported a pyeloplasty rate of 38% in 64 patients with grades 3/4 UPJO-like HN at a median age of 21 months (24).
In contrast, benefit of early pyeloplasty in UPJO-like HN has been vastly reported in the literature. With respect to renal function, Babu et al. compared children with UPJO-like HN and SFU grade 3 or 4 who had pyeloplasty done at a mean age of 2.8 vs. 12.5 months. They found that at 1 year follow-up, the  early group had a significant improvement of split DRF while the delayed group generally had a marginal loss in function (25). Tabari et al. analyzed functional and anatomic indices (cortical thickness, polar length, SFU grade) in patients with early surgical pyeloplasty compared to those with non-surgical management. The early surgical group noted a faster return to anatomical and functional baseline parameters, whereas the non-surgical group had a significant deterioration in function compared to baseline (26). Thus, it is clinically essential to be able to identify those patients with UPJO-like HN that would benefit most from early pyeloplasty, which is exactly what the PPS was intended to do.

Limitations of Other Scoring Systems
Other scoring systems, such as the Hydronephrosis Severity Score (HSS) developed by Babu et al. attempted to predict pyeloplasty using ultrasound and diuretic renogram results (27). The main limitation of the HSS is that it relies greatly on the diuretic renogram and the interpretation of its curve, all factors which are heavily exam and operator dependent. Confounders such as time of furosemide dose (F + 20, F -15, F -0), bladder catheterization or no catherization, oral or intravenous hydration, DRF of the affected kidney, and conjugate views, all may influence the results of the scan (4, 28-31).
Bladder distension and elevated bladder pressures can restrict the upper urinary tract's ability to drain and can prolong the excretory phase, which is difficult to control without bladder catheterization. Patient position has also been demonstrated to affect urine flow, such that when the patient is supine the urine flow can resemble obstruction whereas upright gravityassisted position can increase flow significantly (29). Timing of furosemide administration is controversial. Earlier furosemide administration (F + 0, F -15) urine flow is dramatically increased and can increase the specificity by decreasing the false-positive rate but also results in underestimation of renal function due to acceleration of renal transit (30,32). Later administration of furosemide (F + 20) allows the examiner to compare the drainage curve before and after furosemide to directly observe the modifications to excretion by diuretic. However, prolongation of the excretory phase does run the increased risk of false-positive findings of obstruction (28). It is not difficult to imagine that even with just the variability of one of these three factors, how many protocol variations can be expected across different centers. This will lead to inconsistencies when interpreting study results involving different protocols and radiotracers.

Pyeloplasty Prediction Score Parameters
Therefore, the concept of creating a score relying exclusively on ultrasound parameters was attractive because of its reproducibility. SFU grade, APD and renal length discrepancy measurements were chosen as the components of the PPS system because each one of them has been shown to be significantly associated with obstruction/pyeloplasty, as previously reported. Increasing severity of SFU grade, specifically SFU grades 3 and 4 of post-natal UPJO-like HN, were shown to be independent risk factors for surgery (6,23,33,34). In a prospective study including 501 UPJO-like HN patients, Braga et al. showed that the pyeloplasty rate in patients with SFU grades 3 and 4 was significantly higher than that in those with SFU grades I and II (2% vs. 32%) (23). In a meta-analysis, Lee et al. had demonstrated that severe hydronephrosis (antenatal APD > 15 mm) found during the third trimester had an 88% chance of post-natal pathology (35). Dias et al. had also established that with a prenatal APD > 18 mm in the third trimester and >16 mm in the postnatal period, the sensitivity and specificity of eventually needing pyeloplasty for UPJO-like HN were 100 and 86% (10). Renal length discrepancy on ultrasound has already been shown to be a significantly reliable predictor of abnormal DMSA scans, representing function, and SFU grade, representing obstructive severity. Khazaei et al. showed in children of all ages with a left kidney longer than the right by ≥10 mm or right longer than the left by ≥7 mm corresponded with a positive predictive value (PPV) of 79 and 100% of abnormal DMSA scan (36). Kelley et al. had found that an increase in renal length was significantly associated with SFU 3 and 4 as compared to SFU 1 and 2 (37). The three parameters chosen for the PPS have thus been shown to capture significant anatomical and functional measures independently, so the next logical step was to combine them into a single scoring system. Though drop in differential renal function (DRF) is commonly listed as an indication for pyeloplasty, it has been omitted from the PPS formula. DRF can occasionally be misleading with the supra-normal differential renal function (SNDRF) phenomenon. A finding of SNDRF is generally defined as when the hydronephrotic kidney is found to have higher than normal DRF (>55%) (38,39). It is hypothesized that this finding does not reflect true elevated function but reflects hyper-filtration in the setting of obstruction (38). SNDRF has been found in studies to be associated with significant post-operative decrease in DRF (38,40). Pippi Salle et al. suggested that SNDRF observed during renography is a true phenomenon and that parenchymal proximity and distribution in relation to the pelvis are critical determinants, thus recommending the conjugate view technique for HN renography (41). There is intrinsic measurement error in renal scans of hydronephrotic kidneys making DRF measurement unreliable, due to variation in technique and the presence of the SNDRF phenomenon. Thus, DRF measurements do not have a consistent unidirectional relationship with disease severity that can be effectively utilized in a prediction model such as with the PPS.

Limitations
The main limitation of this study is that despite including widely accepted parameters that vary according to the severity of UPJO-like HN as components of the PPS, surgery indications are operator-dependent. A surgeon can determine his or her own criteria for pyeloplasty with some degree of flexibility from guidelines. Thus, the PPS system should be adopted for research at other centers for evaluation of the external validity of its predictive abilities.
Another limitation of the present study is that there have been debates regarding using pyeloplasty as an outcome in single-center studies involving UPJO-like HN. Those that are against using surgery as an outcome argue that pyeloplasty is inherently a surgeon's threshold for surgery rather than an objective point of need for surgery. However, pyeloplasty is one of the few concrete outcomes that is available in the UPJO-like HN natural history. If pyeloplasty cannot be considered as an outcome, no other concrete objective outcomes are currently available, with the exception of renal function loss and symptoms. As previously discussed, waiting for renal function to deteriorate to indicate surgery with the hopes to regain what has already been lost seems counter-intuitive, especially when nephron preservation is the goal. Using an objective criterion for surgery such as DRF deterioration has its own problems. A recent study, which utilized DRF <40% as the main indication for pyeloplasty, regardless of HN grade and APD, showed a much higher febrile UTI rate of 12.5% for patients followed non-surgically, when compared to previous studies (24). This abnormally higher UTI rate seen, which can be considered as a true outlier, was most likely secondary to waiting too long for renal function loss to occur to intervene.
The PPS system was tested with a dataset from a single tertiary pediatric hospital. In order to further assess its external validity, it should be verified at other centers with prospectively collected data and larger sample sizes.
Despite these limitations, we propose that there is value in attempting to predict which UPJO-like HN patients will undergo pyeloplasty, using the PPS. We encourage that this scoring system be adopted at other centers to verify its findings, and to possibly establish an objective, simple, standard measure to quantitively compare thresholds for surgery between various pediatric urologists.

DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following licenses/restrictions: the authors are not allowed to share data outside their institution without a data sharing agreement. Requests to access these datasets should be directed to Melissa McGrath, mcgram2@mcmaster.ca.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Hamilton Integrated Research Ethics Board. No consent from the participant was required, as the study is part of a long term ongoing database.

AUTHOR CONTRIBUTIONS
LB theorized the presented idea. MM and BL developed the theory and performed the computations. LB and FF verified the analytical methods. BL, MM, FF, and LB contributed to interpretation of the results. BL wrote the manuscript in consultation with MM, FF, and LB. LB supervised the project. All authors provided critical feedback and helped design the research, analysis, manuscript, and figures.