Artificial Intelligence in Predicting Systemic Parameters and Diseases From Ophthalmic Imaging

Artificial Intelligence (AI) analytics has been used to predict, classify, and aid clinical management of multiple eye diseases. Its robust performances have prompted researchers to expand the use of AI into predicting systemic, non-ocular diseases and parameters based on ocular images. Herein, we discuss the reasons why the eye is well-suited for systemic applications, and review the applications of deep learning on ophthalmic images in the prediction of demographic parameters, body composition factors, and diseases of the cardiovascular, hematological, neurodegenerative, metabolic, renal, and hepatobiliary systems. Three main imaging modalities are included—retinal fundus photographs, optical coherence tomographs and external ophthalmic images. We examine the range of systemic factors studied from ophthalmic imaging in current literature and discuss areas of future research, while acknowledging current limitations of AI systems based on ophthalmic images.


INTRODUCTION
Artificial Intelligence (AI) has revolutionized clinical diagnosis and management of diseases in modern day healthcare. Most AI algorithms built for healthcare applications are supervised machine learning (ML) models-the desired solutions, or labels, are provided as inputs alongside the training examples. Iterative optimization and pattern recognition then allows trained models to predict labels in previously unseen test examples. Deep learning (DL) is a subset of ML comprising neural networks, which are adept at computerized visual perception and image recognition. DL algorithms have thrived in image-centric specialties such as ophthalmology (1)(2)(3), dermatology (4), radiology (5,6), pathology (7,8), and many other specialties. In ophthalmology, the applications of AI in detecting ophthalmic diseases based on images have been well-established. These include diabetic retinopathy (9)(10)(11), age-related macular degeneration (11)(12)(13)(14), glaucoma (11), refractive error (15), and retinopathy of prematurity (16,17). In recent years, application of AI-based analytics in ophthalmic images have not only shown its ability in detecting of ocular diseases, but also estimating systemic parameters and predicting non-ocular diseases .
The eye is a uniquely accessible window that allows direct visualization of neuro-vasculature using non-invasive imaging modalities. Because the retina and other end organs, such as the brain and kidneys, share similar anatomical and physiological properties, retinal vessels are an indirect representation of the systemic microvasculature (48)(49)(50). Analysis of microvascular changes provides valuable information, as such changes often precede macrovascular diseases such as stroke and ischemic heart disease. Additionally, the retina is an extension of the central nervous system (CNS), and optic nerve fibers are effectively CNS axons. Many neurodegenerative conditions that involve the brain and spinal cord have ocular manifestations (51,52). Retinal nerve fiber layer (RNFL) thickness (53) and visual acuity (54,55) have been associated with early-stage cognitive impairment. Furthermore, the external eye (i.e., conjunctiva) is a primary area where clinical signs of jaundice, cholesterol deposits and anemia manifest. Finally, the technology-dependent and image-centric nature of ophthalmology greatly facilitates the accumulation of imaging datasets required for the development of AI algorithms. Hence, ophthalmic imaging coupled with AI analytics have great potential to predict systemic biomarkers and disease.
This review discusses the applications of AI analytics in predicting systemic parameters or disease from ophthalmic images. We provide an overview of the major ophthalmic imaging modalities currently used in AI and discuss how these images were used in the prediction of demographic parameters, body composition factors and diseases of the cardiovascular, hematological, neurodegenerative, metabolic, endocrine, renal, and hepatobiliary systems.

METHODS
For this narrative review, electronic bibliographic searches were conducted in PubMed, EMBASE and Web of Science up to 1 February 2022. MESH terms and all-field search terms were searched for "artificial intelligence, " "neural networks, " "machine learning, " "deep learning, " "imaging, " "eye." Search results were screened for relevance. References cited within the identified articles were used to further augment the search. Abstracts, Reviews, Correspondence, Opinions, Editorials, and Letters were excluded. Studies were included if they used an ophthalmic imaging modality to predict or quantify a systemic, non-ocular condition or laboratory parameter. This review encompassed an international search, but only articles published in English were used. Information extracted for qualitative analysis includes study details, model architecture, dataset, population, imaging modality, body system/disease, internal/external validation methods, reference standard, raw data of diagnostic accuracy. This review is limited to articles published from 2012 onwards.

OPHTHALMIC IMAGES AS INPUT TO PREDICTIVE MODELS
Many imaging modalities are clinically available in ophthalmology-retinal fundus photography (RFP), optical coherence tomography (OCT), OCT-Angiography (OCT-A), fluorescein angiography, ultrasound biomicroscopy, anterior segment photographs; this list is non-exhaustive. Regarding input images, the development of robust AI models requires meaningful data at a sufficient scale, which can be difficult to acquire. Khan (36), and Media Research Lab Eye (MRL Eye) for external eye photographs (57). In the prediction of systemic biomarkers and diseases, a similar trend holds-the most widely used ophthalmic imaging modality is RFP, followed by OCT, then external eye images (such as anterior segment photographs or slit lamp photographs) (Table 1, Figure 1).

RETINAL FUNDUS PHOTOGRAPHY
RFP is a low-cost, simple imaging technique with widespread applications. Fundus cameras have evolved over time, from traditional table-top cameras to hand-held and smartphonebased cameras. In addition to portability, advancements in medical technology have allowed sharper images, non-mydriatic wide-field options and pupil tracking. Panwar et al. (64) reviewed the twenty-first century advancements in RFP technology and discussed the pros and cons of various types of fundus cameras. While the portability and reduced cost of newer devices are welcome for mass screening purposes, traditional office-based fundus cameras are a mainstay for research purpose because they generally provide the best image quality and have strong clinical validation in comprehensive clinical trials. The study by Poplin et al. (36), published in March 2018, was one of the earliest major studies that predicted systemic biomarkers from RFP. The study, conducted by a team of researchers from Google AI and Stanford School of Medicine, introduced the idea that robust RFP-based models can be trained to predict a wide range of non-ocular parameters. Supplementary Table 1 summarizes performances of RFP-based models in predicting non-ocular diseases and parameters. Anatomically, the fovea, macula, optic disc, and retinal vessels have all been described as essential structures used by AI models for prediction and classification (Figure 2).

Predicting Age and Gender From RFP
Nine studies predicted age or gender from RFPs (30,31,34,36,38,(45)(46)(47)60). Age as a continuous parameter showed robust predictability in internal datasets (R 2 : 0.74-0.92). Rim et al. (38) additionally investigated model performance in external datasets (R 2 : 0.36-0.63), showing limited generalizability. In subgroup analysis of the Singapore Epidemiology of Eye Diseases (SEED) dataset, age was well-predicted across Chinese, Indian, and Malay ethnic groups. As a follow-up to Poplin et al. (36) which showed that RFP could be used to predict gender, Yamashita et al. (45) tried to understand what features are identified by algorithms as useful in predicting gender. They performed logistic regression on several features identified to be associated with sex, including papillomacular angle, tessellation   (34,46). The reasons for this disparity could include the field of view of the RFP dataset, and whether they were derived from healthy or diseased patient populations. Gerrits et al. (47) performed similar analysis of age and gender in a Qatari dataset and suspected that their algorithm could be indirectly predicting age or gender during their performance on other intended biomarkers. For example, substantial differences in model performance were found between females and males for relative fat mass and testosterone. However, the performance of gender prediction in age-stratified subgroups, and vice-versa, were similar, suggesting that the features used during age and gender prediction are largely independent (47). In analysis of activation maps, Munk et al. (34) and Poplin et al. (36) reported that the optic disc, macula, peripapillary area, and larger blood vessels within the posterior pole seem crucial for gender and age prediction. Non-random sex prediction using RFP seems only possible if the fovea and optic disc were visible (34). Korot et al. (31) experimented with a code-free model to predict gender (AUC: 0.93). The Google Cloud automated machine learning (AutoML) platform was used to provide a graphical user interface (GUI), allowing physicians with no coding background to craft ML models for medical image analysis. This suggests that a code-free framework could be comparable to state-of-the-art algorithms designed for similar tasks by coders. Nevertheless, we note that using AI to predict age and gender inherently has poor clinical utility; however, these were two of the earliest parameters to be predicted from RFPs by neural networks as they are unambiguous, and easily available as data.

Predicting Smoking and Alcohol Status From RFP
Regarding smoking and alcohol status, current models describe notable prediction performance (36,43,46,47). AUC of smoking status ranged from 0.71 to 0.86. Only one study by Zhang et al. (46) predicted alcohol status (AUC: 0.95). "Alcohol status" was defined as "current alcohol drinkers of >12 times in the past year" (46). One must note that the "ground-truths" for these parameters are self-reported from patients via questionnaires. Hence, model performance would be limited by information bias and patients' truthfulness when stating their smoking frequency and alcohol intake.

Predicting Body Composition Factors From RFP
Body composition factors predicted from RFP include body mass index (BMI), body muscle mass, height, weight, relative fat mass, and waist-hip ratio (WHR) (36,38,46,47). Performance of current algorithms in BMI prediction is generally poor with low R 2 -values (R 2 : 0.13-0.17). Model generalizability across ethnically distinct datasets was poor as well. Rim et al. (38) found that DL algorithms for prediction of height, body weight, BMI (and other non-body composition factors), trained on a South Korean dataset, showed limited generalizability in the UK Biobank dataset (majority White ethnicity) (R² ≤ 0.08).
Proportional bias was observed, where predicted values in the lower range were overestimated and those in the higher range were underestimated. While BMI is a parameter of interest due to its well-established associations with all-cause (65) and causespecific mortality (66), prediction of other plausible parameters of body composition have been described. The prediction of body muscle mass is noteworthy, as it is a potentially more reliable biomarker than BMI for cardiometabolic risk and nutritional status (38). Rim et al. (38) reported that body muscle mass could be predicted with an R² of 0.52 (95% CI: 0.51-0.53) in

Predicting Hematological Parameters From RFP
Hematological parameters predicted from RFP include anemia, hemoglobin concentration, red blood cell (RBC) count and hematocrit (33,38,46). Ophthalmic imaging-based DL algorithms have been used to predict cut-off points of hematological parameters (as a classification task). For instance, Mitani et al. (33) predicted anemia categories and Zhang et al. (46) predicted hematocrit ranges from fundus photographs with AUC > 0.75. There were also attempts to predict continuous parameters, such as RBC count (33), hemoglobin (38), and hematocrit (33,38) from fundus photographs were poorer (RBC count: R 2 0.14-0.35; hemoglobin: R 2 0.06-0.56; hematocrit: R 2 0.09-0.57). Mitani et al. (33) further studied the importance of different anatomical features to anemia by blurring and cropping the RFPs during both training and validation. Notably, when the upper and lower hemispheres of the images were progressively masked, performance declined only after ∼80% of the image was covered. Masking using a central horizontal stripe (covering the disc and macula) caused a drop in AUC when only 10% of the image was masked. The models performed better than chance even after high-resolution information was removed with substantial Gaussian blurs, and after image pixels were randomly scrambled, suggesting that the models could make use of the general pallor of the retina to predict anemia.

Predicting Neurodegenerative Disease From RFP
Most studies in current literature that predicted neurodegenerative disease used OCT-based models. These will be elaborated on in sections below.

Implications and Clinical Utility
Prediction of systemic disease from RFPs is a hotly studied topic, and seems like the logical next step, given robust existing algorithms for predicting ocular diseases (for instance, diabetic retinopathy, age-related macular degeneration, and glaucoma) from RFPs (82). Prediction of certain outcomes, such as age, gender, weight, and BMI, may not be particularly meaningful, given the ease of determination or measurement of these outcomes without a complex computer algorithm. For more novel outcomes, such as Alzheimer's Disease, CKD, atherosclerosis, and CAC, crafting algorithms to predict incidence of these conditions, rather than prevalence, might serve more clinical utility for early intervention. However, in reality, robust incidence data is more logistically difficult to acquire than prevalence data. Next, the introduction of smartphone-based fundus imaging in recent years presents a low-cost alternative to conventional RFP (83). There are several advantages of smartphone-based imaging, including portability, built-in connectivity and processing, and minimal need for training. This could make it suitable for telemedicine or primary screening purposes, particularly in lower income settings where tertiary care may not be easily accessible. However, smartphone fundus image quality varies considerably, and there is a need for inter-device comparison, leading researchers to consider a necessary reference standard for grading (83).

OPTICAL COHERENCE TOMOGRAPHY
OCT is a non-invasive diagnostic technique that provides high resolution in vivo cross-sectional images of retinal and choroidal structures. As OCT is a safe, fast, and non-invasive imaging modality with wide applicability in eye clinics, this technology has produced large volumes of clinical images (secondary only to RFP), making it a suitable candidate for training AI models. Kapoor et al. (84) has previously reviewed the applications of AI and OCT in ophthalmology, including the detection of macular edema (85), age-related macular degeneration (86), and glaucoma (87,88). OCT-A is an advancement of OCT technology, based on the variable backscattering of light of moving red blood cells. This motion-contrast imaging accurately depicts retinal vessels through different segmented areas of the eye, eliminating the need for intravascular dyes (89). Unlike RFP-based AI models, the systemic applications of AI and OCT or OCT-A are more limited in current literature ( Table 2). Only one study by Aslam et al. (18) predicted diabetic status with OCT-A using various supervised ML architectures, reporting an AUC of 0.80 on the best performing, random forest model. However, the model was troubled by low specificity rates. OCT-A based outcome measures that were used to predict diabetes included ischemic areas around the foveal avascular zone (FAZ), FAZ circularity, mean capillary intensity and mean vessel intensity (18). Readers should be aware that using such OCT-A derived metrics as inputs, compared to the OCT-A image itself, is a fairly different task compared to using RFPs as inputs.
OCT models were largely used to predict neurodegenerative diseases, including multiple sclerosis (MS), Alzheimer's Disease and Parkinson's Disease (PD) (20,35,62). We observed that the models in this section were shallow learning algorithms-support vector machine (SVM) and random forest-as opposed to neural networks. Clinical studies have shown robust differences between the retinas of people with MS and healthy controls in the peripapillary RNFL, and macular ganglion cell layerinner plexiform layer ( (91). The promising results of these studies suggest that OCT scans incorporated with AI analytics could have some utility as a screening adjunct. Thanks to an abundance of OCT scans in modern tertiary eye centers, AI-based analysis of OCT images has expanded to improve patient screening and facilitate clinical decision-making. Given that OCT parameters evaluate retinal and choroidal layers, a further step for future research could be exploring the utility of such parameters via machine learning techniques (for instance, choroidal thickness, choroidal vascularity index, retinal nerve fiber layer thickness) relative to deep learning techniques, where the algorithms are fed whole images. Regarding future trends, most current published studies in AI and OCT imaging focus on the posterior segment of the eye, but recent studies have started to explore its use in the anterior segment as well (84).

EXTERNAL EYE IMAGING
Photographs of the external eye, often either captured with cameras mounted on slit lamps, are often used to document anterior segment disease in ophthalmology. Systemically, AI studies in current literature have reported the use of such images to predict gender, HbA1c levels, diabetic status, anemia, and various liver pathological states ( Table 2) (22,27,29,40,44,59). As described in earlier sections, Xiao et al. (44) constructed two sets of models (slit lamp based and RFP based) to predict hepatobiliary disease states-model performances on slit lamp images was better than RFP in liver cancer, cirrhosis, and chronic viral hepatitis. Excessive bilirubin accumulation causing yellowing of the sclera and conjunctiva is a common presentation in compromised liver function. These robust manifestations, detectable on external eye images, could explain the difference in performance. Visualization techniques showed that in addition to the conjunctiva and sclera, iris morphology and color contained important predictive features (44), suggesting the presence of iris morphological changes secondary to liver damage that have yet to be elucidated. Babenko et al. (59) predicted HbA1c at various cut-offs of 7, 8, and 9% using external eye images from EyePACS, a teleretinal screening service in the United States (92). Low resolution images of 75 × 75 pixels (0.1% of the resolution of an 8-megapixel smartphone camera) as inputs achieved moderate model performances of AUC 0.64-0.74. Ablation analysis and saliency maps indicated that information from the center of the image (pupil/lens, iris, cornea, limbus) was most related to HbA1c (59). Uses for such a screening system are manifold. Thresholds of HbA1c > 9% could highlight diabetic patients with difficulties controlling blood glucose levels, and in need closer follow-up or medication changes; thresholds of HbA1c > 7% could identify asymptomatic patients at risk for early or mild diabetes, allowing referral for a confirmatory blood test. Regarding anemia, while phlebotomy remains the gold standard of diagnosis, physical examination of the palpebral conjunctiva is a quick and arbitrary clinical assessment method. Chen et al. (22) managed to predict hemoglobin levels of < 11 g/dL from external eye images of the palpebral conjunctiva. However, dataset size was small (50 images). The model thus requires more input data, and validation on external datasets.
Looking beyond diabetes, liver diseases and anemia, the findings of the above studies raise the interesting possibility that external eye images could contain useful signals, both familiar and novel, related to other systemic conditions. For example, hyperlipidemia and atherosclerosis can manifest with xanthelasma (93). Thyroid eye disease can manifest with chemosis, conjunctival injection, lid retraction and lower scleral show (94). Obstructive sleep apnea is associated with floppy eyelid syndrome (95). Neurofibromatosis Type 1 manifests with melanocytic hamartomata of the iris (Lisch nodules) (96). Myasthenia Gravis can present with ptosis and ocular dysmotility (97). Dry eyes, conjunctival injection, and uveitis are all possible manifestations of systemic lupus erythematosus (98), while corneal deposits of uric acid have been reported in hyperuricemia and gout (99). Such manifestations could be readily captured on external eye photography for systemic disease prediction models. While these suggested diseases are relatively common, the practicality of such models would depend on the rarity of the associated eye signs, the fact that laboratory screening tests are much more commonplace, and whether such theoretical models can be built in the first place.

Areas of Potential Improvement
We have noted several limitations of existing work and areas with untapped potential. Firstly, many current studies lack external validation (Table 1), which is critical for establishing robust and generalized AI models. Sole internal validation cannot support firm conclusions regarding the algorithms' value for disease screening in new populations. The ability of predictive models to generalize across various ethnic and geographical datasets is not a guarantee, or a simple task to achieve, but will add greatly to the clinical utility of the constructed AI system. Second, the field of ophthalmic imaging has unrealized potential in predicting additional systemic parameters. Several studies attempted predictions of other markers in addition to those reported, albeit with varying (and often poorer) results (38,46,47). For instance, Rim et al. (38) performed analysis on 47 biomarkers in total, although only 10 were eventually deemed "predictable." The fields of predicting hepatobiliary and neurodegenerative disease from ophthalmic imaging are particularly nascent. The models described by Xiao et al. (44) in 2021 was the first to establish qualitative associations between ocular features, liver cancer and cirrhosis, and future studies are needed to reaffirm their findings. Much of the ongoing work bridging neurodegenerative disease and retinal imaging involves OCT, although vascular features on RFP have shown meaningful associations with cognitive decline (75). Third, OCT-based algorithms to predict renal disease have not been explored in current literature. OCT, unlike RFP, allows imaging of the choroidal vasculature, and choroidal thinning has been associated with lower eGFR and higher microalbuminuria independent of age and other vascular risk factors (100,101). Whether these OCT-based metrics reflect renal microvascular damage better than standard creatinine/eGFR/albumin-creatinine-ratio measurements could be tested in future studies, although we expect that this is unlikely, and it would be difficult to conduct such a comparative study. Fourth, given the widespread availability of OCT, slit-lamp imaging and RFP in ophthalmic clinical practice, AI systems built on two or more different ophthalmic imaging methods would provide alternatives and improve adaptability. Fifth, there is good potential for AI systems built on ophthalmic imaging in community screening programs or primary care settings. In principle, addition of various predicting models for systemic biomarkers to current teleophthalmology software could enable low-cost, non-invasive screening for multiple diseases in the general population. Aside from clinical validation, economic viability and cost-effectiveness would have to be evaluated as well. Sixth, most studies predicting systemic parameters from ophthalmic imaging are estimating current or prevalent disease. To predict incidence of these conditions, rather than prevalence, might serve more clinical utility; much potential utility of AI systems would be unlocked if they were able to detect disease where standard clinical examinations or laboratory tests fail to do so. Seventh, studies evaluating the ability of AI ophthalmic imaging algorithms to detect longitudinal changes in systemic disease, or to stage systemic disease severity, are currently lacking. This could be an area of future interest.

Challenges in Research
There are several challenges to be appreciated as AI becomes more integral to medical practice. Firstly, using ophthalmic imaging to predict systemic disease would require collaborative efforts across departments. This might pose difficulties as systemic parameters are not always required for management in ophthalmic clinics, and vice versa. Hence, input images and target variables may need to be collected separately and deliberately (102). Secondly, barriers of access to ophthalmic imaging datasets can be reduced-including issues of cost, time, usability, and quality (56). Third, labeling processes for publicly available datasets are often poorly defined; assurance of labeling accuracy is paramount because the standards used for labeling of ground truths have implications on any AI model trained on the dataset. Fourth, it may sometimes be necessary to acquire datasets from different local and international centers for training or external validation purposes. State privacy and data regulatory rules need to be respected, the process of which is time consuming and cost-incurring. Fifth, most of the datasets used for developing or testing DL models are based on retrospective datasets. Further validation using well-characterized prospective datasets would be needed to assess clinical utility.

Challenges in Real-World Applications
Regarding real-world applications, high-quality ophthalmic images may be difficult to acquire in patients with small pupils. Such patients may require pupil dilation with topical pharmaceuticals, increasing collection time per image. Databases to save and transfer high quality images are needed. Also, the potential for bias or error must be respected. Algorithmic outcomes reflect the data used to train them; they can only be as reliable (but also as neutral) as the data they are based on (103). Projection of biases inherent in the training sets by AI systems is a concern for medical ethics (104), and ensuring generalizability across different geographical and ethnic groups is essential to avoid inadvertent, subtle discrimination in healthcare delivery (105). Next, cost-effectiveness studies are required before real world implementation. Retinal images are currently used in diagnosis of ophthalmic pathologies. For systemic disease, however, the use of retinal images is not part of standard care. Cost effectiveness studies are needed to justify their use over or alongside current standard tests (for example, diagnosing anemia using retinal images vs. a full blood count), many of which are well-integrated into existing healthcare practice and infrastructure. Finally, DL algorithms suffer from the "black box" problem, because it is a program that discloses the input and output but gives no view of the intermediate processes. While it is common for many studies to provide overlay saliency maps for explanatory purposes, it remains unclear how the algorithms arrived at such predictions.

CONCLUSIONS
To date, RFP, OCT, and external eye imaging are the leading ocular imaging modalities for systemic AI applications. Ophthalmic AI models for predicting systemic disease is a novel field in its nascency, but there is great capacity for translation into wider practice in the future, if the technology is carefully designed, operated, and monitored under the supervision of clinicians. Further efforts are underway to explore other systemic risk factors and parameters that could be predicted from the ophthalmic images. If validated, these algorithms could be implemented as adjunctive screening in primary care settings. Prospective studies are needed to evaluate real-world reliability, efficacy, and cost-effectiveness, and to gain acceptance from various stakeholders. Collaborative efforts are needed to ensure the best medical technology available is incorporated into practice for the benefit of patients.

AUTHOR CONTRIBUTIONS
TR and C-YC conceived and planned the study. BB performed the literature search, organized the database, and wrote the first draft of the manuscript. BB, TR, CS, and C-YC wrote sections of the manuscript, contributed to interpreting the results, and provided critical feedback to the manuscript. All authors contributed to the intellectual development of this paper. The final version of the paper has been seen and approved by all authors.