SYmptom-Based STratification of DiabEtes Mellitus by Renal Function Decline (SYSTEM): A Retrospective Cohort Study and Modeling Assessment

Background: Previous UK Biobank studies showed that symptoms and physical measurements had excellent prediction on long-term clinical outcomes in general population. Symptoms and signs could intuitively and non-invasively predict and monitor disease progression, especially for telemedicine, but related research is limited in diabetes and renal medicine. Methods: This retrospective cohort study aimed to evaluate the predictive power of a symptom-based stratification framework and individual symptoms for diabetes. Three hundred two adult diabetic patients were consecutively sampled from outpatient clinics in Hong Kong for prospective symptom assessment. Demographics and longitudinal measures of biochemical parameters were retrospectively extracted from linked medical records. The association between estimated glomerular filtration rate (GFR) (independent variable) and biochemistry, epidemiological factors, and individual symptoms was assessed by mixed regression analyses. A symptom-based stratification framework of diabetes using symptom clusters was formulated by Delphi consensus method. Akaike information criterion (AIC) and Bayesian information criterion (BIC) were compared between statistical models with different combinations of biochemical, epidemiological, and symptom variables. Results: In the 4.2-year follow-up period, baseline presentation of edema (−1.8 ml/min/1.73m2, 95%CI: −2.5 to −1.2, p < 0.001), epigastric bloating (−0.8 ml/min/1.73m2, 95%CI: −1.4 to −0.2, p = 0.014) and alternating dry and loose stool (−1.1 ml/min/1.73m2, 95%CI: −1.9 to −0.4, p = 0.004) were independently associated with faster annual GFR decline. Eleven symptom clusters were identified from literature, stratifying diabetes predominantly by gastrointestinal phenotypes. Using symptom clusters synchronized by Delphi consensus as the independent variable in statistical models reduced complexity and improved explanatory power when compared to using individual symptoms. Symptom-biologic-epidemiologic combined model had the lowest AIC (4,478 vs. 5,824 vs. 4,966 vs. 7,926) and BIC (4,597 vs. 5,870 vs. 5,065 vs. 8,026) compared to the symptom, symptom-epidemiologic and biologic-epidemiologic models, respectively. Patients co-presenting with a constellation of fatigue, malaise, dry mouth, and dry throat were independently associated with faster annual GFR decline (−1.1 ml/min/1.73m2, 95%CI: −1.9 to −0.2, p = 0.011). Conclusions: Add-on symptom-based diagnosis improves the predictive power on renal function decline among diabetic patients based on key biochemical and epidemiological factors. Dynamic change of symptoms should be considered in clinical practice and research design.


INTRODUCTION
The pathogenesis of diabetes is heterogeneous and there is a constant call for more personalized management (1,2). Majority of existing studies stratified diabetes on the molecular level (1). Previous studies of UK Biobank suggested that phenomes, including symptoms and physical measures, had an excellent prediction of long-term clinical outcome in general population (3). Network medicine integrates phenome and genome to reclassify disease as the base of personalized medicine (4,5). However, symptomatology is less explored in diabetes and kidney diseases.
The SHIELD study showed that each of the 7 common diabetes symptoms is associated with onset of diabetes among non-diabetic population (21). Further expanding the scope of symptom analysis could identify symptoms that are specific (and therefore less commonly presented) to disease progression. However, untargeted mining of symptoms (e.g., by latent class analysis) requires an exponentially larger sample size to compensate the measurement errors arise from assessing multiple individual symptoms. Supervised and targeted analysis with qualitative inputs from clinical perspective (e.g., by Delphi consensus panel) could reduce the large number of symptoms Abbreviations: AIC, Akaike information criterion; BIC, Bayesian information criterion; CI, confidence interval; CKD, chronic kidney disease; CM, Chinese medicine; DKD, diabetic kidney disease; GFR, glomerular filtration rate; HbA1c, hemoglobin A1c; SS, symptom-based subtype; UACR, urine albumin-to-creatinine ratio.
to fewer symptom clusters for more efficient statistical modeling and clinical use. Similar dimension reduction strategy from genotype to phenotype has been used in deep learning methods to reduce sample size required to predict personalized drug response (22).
Chinese medicine (CM) has a long history in symptom-based disease management as biochemical and molecular investigations were not available at the time of theory development (23)(24)(25). Previous big data studies showed that add-on symptombased CM treatment led to reduced risk of end-stage kidney disease and mortality among chronic kidney disease (CKD) patients with diabetes (26)(27)(28). CM subclassifies diseases based on different symptom combinations on top of conventional medicine diagnosis to formulate personalized regimens (12,17,29). In each symptom-based subtype (SS) of diabetes, related symptoms are weighed into core and supplementary criteria based on the importance to diagnosis (30). Although CM stratification system has the potential to refine diabetes classification, there is no synchronized framework. The benefit of add-on symptom-based stratification remains undetermined for diabetes.
Symptom could serve as an intuitive and non-invasive tool to predict and monitor clinical progression, especially for telemedicine. Previous study shows that retarding renal function decline is the priority among diabetic patients with no/early CKD (25). This study aimed to investigate the association between the presence of symptoms, SSs and renal function deterioration among diabetic patients, and to establish a synchronized symptom-based framework for stratifying diabetes based on the quality of different statistical models in predicting renal function.

Study Design
Retrospective cohort study on the cross-sectional and longitudinal association between renal function and risk factors (Figure 2). Delphi technique (31)(32)(33) was used as the consensus methodology to construct a symptom-based classification framework for diabetes that comprised of symptom FIGURE 1 | Schematic diagram of association between pathogenesis, biochemical parameters, and symptoms. The pathogenesis of diabetes is multifaceted and symptomatic presentation could reflect the whole-system response.
clusters. Explanatory power and complexity of statistical models with different combinations of biochemical, epidemiological, and symptom variables were compared by information criteria used in machine learning.

Participants
Three hundred two diabetes subjects aged 18 or above on antidiabetic medication were recruited consecutively from general outpatient clinics of public hospitals in Hong Kong through a 30-min health service programme from July 2014 to December 2019. Although there is no consensus on sample size calculation for comparing model quality, modeling bias is shown to be small and independent to sampling design empirically when sample size is over 100 (34).
Ten conventional medicine and CM dually trained experts in internal medicine and classical CM theory (Supplementary Material 6) were purposively sampled to join the Delphi consensus panel to diversify the schools of theory on symptom-based diabetes diagnosis.

Data Collection
Ninety-one symptoms, signs and behavioral factors (Supplementary Material 5) were extracted from existing literature. The presence of symptoms were prospectively collected face-to-face by physicians based on a standardized, structured, and field-tested questionnaire through the service program (35). Presence of symptoms was quantified by the days of presence within 2 weeks of consultation. Diagnosis of the 11 SSs was based on the presence of symptoms with reference to the symptom-based diagnosis framework derived from the expert consensus described below. To ensure data consistency, only 2 trained and synchronized physicians collected the data.
Biochemistry and demographics (listed in Data Analysis) of patients were retrieved from linked electronic medical record including a comprehensive and standardized annual multidisciplinary Risk Assessment and Management Programme (36). The Programme captures patients' demographics assessed by trained nurse, and the continuous change of key biochemical parameters. All outcome assessors were blinded from the analysis plan and biochemistry data to avoid bias. Longitudinal data was retrieved from the electronic medical system until 1 December 2020 by 2 researchers (K.W.C., K.Y.Y.) independently with a standardized form.

Delphi Consensus Process and Questionnaire
Consensus methods are commonly used in constructing agreeable frameworks of existing practice in medicine (31). Delphi technique, which involve iterative anonymous rating, has advantage in handling dominant opinion and geographical constraint (32). The consensus panel was formed based on their academic capacity, peer reputation, geographical coverage, and school of thoughts to maximize the coverage of opinion while being specific to diabetes. A panel of 10 was recruited to maximize participation and consensus efficiency (32,37). The consensus FIGURE 2 | Flow diagram of research. Symptoms of diabetes were analyzed as individual symptom and symptom clusters. Presence of 91 symptoms was recorded with field tested questionnaire by 2 synchronized physicians. Symptom clusters was constructed by extraction from literature followed by a consensus panel. The explanatory power and complexity of different models with combinations of symptoms, biochemistry, and demographics were compared by Akaike information criterion and Bayesian information criterion used in machine learning.
result was used to reduce the dimension of symptom analysis from individual symptoms to symptom clusters.
The Delphi consensus was divided into 2 rounds. A questionnaire with Likert scale of 1-9 was designed as the medium for rating (Supplementary Material 7). Since CM has a rich history of using symptom-based diagnosis (23,24), we sought to expand the scope of symptom analysis by extracting CM literature. Ninety-one baseline symptoms were extracted from all 11 documented SSs from key symptombased diabetes guideline published by academy, academic society and government regulatory bodies (Supplementary Material 2) (38)(39)(40)(41), and standardized by a national standard of medical terminology (42). The baseline questionnaire contained 300 symptoms and could be completed in 30 min in field tests.
In the first round, experts rated anonymously the importance of the symptoms in diagnosing each SS and were invited to supplement necessary symptoms. Consensus was pre-defined as two-third majority rating 1-3 (agreed as insignificant) or 7-9 (agreed as significant). Symptoms reached consensus were then removed from the questionnaire. The result in the first round was disseminated back to and confirmed by the consensus panel. Dissonant symptoms were circulated again in the second round along with the additional symptoms collected in the first round. Symptoms reached consensus in either rounds were defined as core criteria for the diagnosis of the corresponding SS. Sensitivity analyses with different consensus definition (average score of 6.7) and handling of missing values (last-observation-carryforward and extreme values) were conducted. Symptoms reached consensus in sensitivity analyses were defined as supplementary criteria. Results were confirmed by panel members.

Data Analysis
Estimated GFR was calculated by the CKD-EPI equation (43). Education (no formal/primary/secondary/tertiary), type of diabetes (type 1/type 2/gestational), obesity (optimal weight/underweight/over-weight/moderately to severely obese), and smoking (non-/ex-/current smoker) and alcohol (non-/ex-/social/current drinker) history were further stratified. Univariable and backward stepwise multivariable regression models on the correlation between GFR and binary presence of individual symptoms were evaluated. Missing values were replaced by multiple imputation with 20 sets of data using age as auxiliary variable for the cross-sectional analysis.
Annual rate of GFR change over the follow-up period (in weeks) with repeated measures was analyzed by mixed models with unstructured covariance and adjusted for (1) baseline levels of hemoglobin A1c (HbA1c), systolic blood pressure, low-density lipoprotein, log-transformed urine albumin-to-creatinine ratio (UACR), and GFR, (2) baseline presence of symptoms or SSs, (3) gender, type of diabetes, smoking history, alcohol consumption, and obesity, (4) interaction between baseline GFR and visit, (5) and interaction between presence of symptoms or SSs and visit. The included biochemical parameters and epidemiological factors are known predictors of renal function decline. Baseline presence of symptoms was used as the explanatory variable over continuous change of symptoms as the analyses were focused on prediction instead of continuous association. The slope of change was analyzed with a linear growth model with sensitivity analysis of using repeated measures of biochemical parameters and blood pressures as covariates instead of baseline values, as previous studies showed that the majority of GFR decline is linear (44).
Five multivariable regression models were built using GFR as the dependent variable to assess the power of explaining the variance in GFR, namely (1) Table 1). Both cross-sectional regression models and longitudinal mixed models were evaluated.
Quality of the statistical models was assessed by Akaike information criterion (AIC) and Bayesian information criterion (BIC). AIC and BIC assessed the quality of statistical models in predicting GFR with different discount on model complexity, and are commonly used as model selection criteria in machine learning (45). Lower AIC and BIC refers to model that explain the variance better with less complexity, and is regarded as a better model. GFR was selected as the dependent variable for the models as diabetic kidney disease (DKD) is the most prevalent and costly complication of diabetes with limited known progression risk factors (44). Prevention of renal function decline has been suggested as the top priority among diabetic patients in our  Table 5.
previous focus group interview series (25). The efficiency of diagnostic models in predicting rate of renal function has clinical significance (46). STATA 15.1 was used for analysis.

Cross-Sectional Analysis of Individual Symptoms
The association between GFR and individual symptoms are summarized in Table 3   SS8: dry sensation and fatigue/malaise; (9) SS9: waist and knee musculoskeletal and otorhinolaryngological symptoms; (10) SS10: general weakness, waist and knee musculoskeletal symptoms, nocturia/frothy urine, and hyposexuality; and (11) SS11: signs of poor circulation (Supplementary Material 1). Invitation was sent to 11 experts. One refused due to work engagement and 10 experts joined the consensus panel (Supplementary Material 6). After 2 rounds of consensus, 90 symptoms reached consensus in total for the 11 SSs ( Table 4). The inter-rater Cohen's Kappa between consensus definition of two-third majority and average score of 6.7 was 0.90 overall (0.89 and 0.90 for the first and second round of consensus, respectively). Sensitivity analyses showed that 39 symptoms reached consensus with consensus definition of 6.7 average score or after imputation of missing values. They were listed as supplementary criteria that add likelihood for the classification of the corresponding SS ( Table 4).

DISCUSSION
This is the first attempt to compare modeling efficiency of renal function in diabetes with different combinations of biochemical parameters, demographics, and symptoms. We synchronized the symptom-based stratification framework through expert consensus and identified individual symptoms which better predicts longitudinal renal function decline. The demographics of our cohort was comparable to that from general outpatient clinics in Hong Kong and other studies globally with considerable generalizability (47). Patients had a moderate level of HbA1c and the symptoms presented were likely residual to glycemic control which require attention.

Current Evidence of Symptom Clustering and Limitation
Symptoms developed from clustering of pathophysiology (9,13,20) and have been used in disease monitoring, or as a treatment response predictor (7,14). Distinctive clustering of frequent urination, excessive thirst, extreme hunger and unusual weight loss has characterized diabetes since 500 B.C.. Increased fatigue, irritability and blurry vision were supplemented since then.
These symptoms have also been demonstrated to correlate with diabetes onset among non-diabetic population in the SHIELD study (21). Independent associations have been established between symptoms and CKD. For instance, dermatological changes (e.g., pruritus, xerosis) were shown to correlate with uricemia (48). DKD was also defined as a composite of clinical presentations (edema, heart failure), epidemiology (long standing diabetes, hypertension), histopathology (nodular glomerulosclerosis, thickened basement membranes), and biochemistry (increased urine albumin) by Paul Kimmelstiel in 1936 (49). Symptoms that associate with the renal progression could further stratify patients for personalized management.
Currently, there is no standardized symptom-based stratification framework for diabetes (50). We identified discrepancies in the listing of commonly presented SSs and the related symptoms across different guidelines. Our newly synchronized framework is more concise. For instance, the defining symptoms of SS1 were simplified from 31 possible symptoms to 9 key symptoms (Supplementary Material 1). This facilitates the comparison of utility between individual symptoms and symptom clusters.

Symptoms Are Associated With Poorer Renal Function
Infection susceptibility, forgetfulness, lumbago, knee buckling, and malar flush were commonly presented in our cohort. Classical presentation of frequent thirst and excessive hunger was less frequent, likely because the patients were already on hypoglycaemic agents that alleviated these symptoms.
From the regression analysis without adjustment of biochemical and epidemiological factors, a strong inverse dose-dependent relationship between GFR and nocturnal polyuria was observed, indicating a quantifiable assessment on nocturnal polyuria in clinical practice. Frequent urination, nocturnal polyuria, darkened complexion, dark lips, and dull purple color on lower limb are associated with lower GFR, which is commonly observed in practice.
After adjusting biochemical parameters and demographics including age, gender, HbA1c, and UACR, a set of different symptoms (edema, crenated tongue) were associated with poorer renal function independently. This is due to the collinearity between symptoms and biochemical parameters (e.g., frequent urination and UACR).
Using symptoms from the unadjusted symptom-based model can estimate renal function in resources limited setting or in mass population screening without requiring laboratory investigations. Symptoms from the adjusted model, meanwhile, can increase the explanatory power on top of known risk factors and indicated pathophysiology independent to the control of glucose, blood pressure, lipid, and albuminuria.

Symptom Clusters Predict 4-Year Renal Function Decline Independently
Model quality assessment showed that using symptom clusters provides better quality in modeling renal function when compared to using individual symptoms. This is due to the natural collinearity of symptoms in which the local dependence between symptoms can be better analyzed by grouping symptoms with shared pathophysiology and predictive power on renal function (51).
From the 4-year follow-up cohort, symptom clusters provide additional explanatory power on top of biochemical and epidemiological factors in stratifying the decline of renal function. Early diabetes presented with SS8, which included fatigue, malaise, dry mouth and dry throat, were associated with 1.1 ml/min/1.73m 2 faster renal function decline annually.

Cohering Gastrointestinal Involvement and Potential Mechanism
The presence of epigastric bloating, and alternating dry and loose stool are associated with faster GFR decline after adjusting blood glucose, blood pressure, lipid and urine albumin. From the 11 SSs that documented in CM literature, diabetes was stratified into three main groups (Figure 1): (1) SS1-SS6: with gastrointestinal presentation, (2) SS7, SS8: with classical diabetes presentations, and (3) SS9-SS11: with neurological and vascular presentations. The symptom-based stratification of diabetes depends heavily (6 of 11 SSs) on gastrointestinal presentations.
The symptom analysis and expert consensus converged to suggest an important role of gastrointestinal involvement in the renal progression of diabetes. Previous studies showed that gut microbiome is involved in the inflammatory and immune response in metabolic diseases, CKD and related phenotypical changes (52,53). We previously demonstrated in vivo that tolllike receptor signaling pathway regulate DKD progression (54)(55)(56). Activation of the pathway via Paneth cells in small intestine due to microbiota dysbiosis can be a possible mechanism warrants further study (57). Endotoxin (e.g., lipopolysaccharide) translocation to systemic circulation due to disruption of gut barrier can also accelerate CKD progression (53).

Challenges in Symptom Standardization and Recent Advancement
Symptom-based research offers a new perspective in assessing correlations in pathogenesis among diseases, accelerating the discovery of secondary application of existing drugs (29). In this study, edema is a strongly independent cross-sectional and longitudinal predictor of reduced renal function. Further multi-level omics analysis stratifying the presence of edema with matching GFR and other confounders could reveal novel mechanisms in renal function decline among early diabetic patients.
Key challenge in symptom-based research is the standardization of subjective symptoms (9). Although the unharmonized definition of symptoms is likely to introduce random measurement error instead of systematic bias, and is unlikely to increase type I error in assessing model quality, a large sample size is needed to demonstrate statistical significance. In this study, thin tongue, deep pulse, and unsmooth pulse were strongly associated with faster renal function decline. Although these symptoms have long been suggested as signs of reduced kidney function and poor circulation in CM theory, standardized, and digitalized measurement is lacking until the past decade (58)(59)(60). Recent advancement in phenotype-oriented network in multi-omics data provided a platform to link up phenotype, genotype, omics, and drug targets for the integration of clinical presentation, genetics, and epigenetics (5,9,50,61,62). These advancements enhance the clinical translation and generalizability of symptom-based diagnosis research to the other parts of the world. Nevertheless, the description of pulse would require substantial work on standardization before being widely utilized.

Limitations
Our cohort was conveniently sampled from public general outpatient clinics. Although the sampling is not random, selection bias was minimized by consecutive sampling. Majority of the diabetes patients in Hong Kong have follow-ups in public general outpatient clinics. The demographics of our cohort matched the territory-wide demographics (47). Besides, there is no standardized questionnaire for the assessment of symptoms internationally. We developed a data entry form for this purpose with field tests, and only recorded the presence of symptoms to minimize the error due to physicians' judgment. Error in symptom assessment is likely random error instead of systematic bias as the assessors were blinded from data analysis plan and biochemistry data. The measurement error is unlikely to increase type I error in assessing model quality. Furthermore, only a small group of experts with outstanding and diversified academic and clinical track record and was purposively sampled to form the consensus panel for better participation and coordination, which common for Delphi technique (32). Further large-scale untargeted mining by latent class analysis can identify the natural clustering of symptoms. In the analysis, the prescription was not analyzed as the control of blood glucose, pressure, lipid, and urine albumin were reflected by the change of HbA1c, systolic blood pressure, low-density lipoprotein, and UACR. Lastly, this study was designed to compare the quality of model performance and was not powered to evaluate the longitudinal predictiveness of each SS. The SSs that did not reach statistical significance are subject to type II error. Further large-scale correlation study is needed to precisely assess the clinical and statistical significance of each SS.

CONCLUSION
Our findings showed that symptoms, in particular gastrointestinal symptoms, are strongly and independently associated with renal function decline among diabetic patients. Add-on symptom-based stratification improved the classification of diabetes in predicting glomerular filtration rate decline. Our study proved the concept and clinical value of using symptoms and symptom clusters to predict disease progression either alone or in combination with key biochemical and epidemiological factors. Further prediction algorithms of mixed molecular-symptom diagnosis are warranted. Dynamic change of symptoms should be considered in clinical practice and research design.

EXISTING EVIDENCE
Previous UK Biobank studies showed that phenomes, including symptoms and physical measurements, had excellent prediction on long-term clinical outcomes in general population. Symptom-based research in diabetes and renal medicine is limited when compared to other disciplines.

ADDED VALUE
• In this 4-year cohort, edema, and gastrointestinal symptoms were independently associated with faster renal function decline among diabetic patients after adjusting epidemiological factors and control of blood glucose, blood pressure, lipid, and albuminuria. • Using symptom clusters instead of individual symptoms as the independent variable in statistical models reduced computational dimension and improved model quality (increased explanatory power and less complexity). • Clustering analysis revealed that patients presenting with a constellation of fatigue, malaise, dry mouth and dry throat were independently associated with faster renal function decline.

IMPLICATION
Add-on symptom-based diagnosis improves the predictive power on renal function decline among diabetic patients in addition to key biochemistry and epidemiological factors. Further prediction algorithms of mixed molecularsymptom diagnosis are warranted. Dynamic change of symptoms should be considered in clinical practice and research design.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Institutional Review Board of the University of Hong Kong/Hospital Authority Hong Kong West Cluster. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.
of the manuscript; and decision to submit the manuscript for publication.