Sex-related difference in the retinal structure of young adults: a machine learning approach

Purpose To compare the accuracy of machine learning (ML) algorithms to classify the sex of the participant from retinal thickness datasets in different retinal layers. Methods This cross-sectional study involved 26 male and 38 female subjects. Data were acquired using HRA + OCT Spectralis, and the thickness and volume of 10 retinal layers were quantified. A total of 10 features were extracted from each retinal layer. The accuracy of various algorithms, including k-nearest-neighbor, support vector classifier, logistic regression, linear discriminant analysis, random forest, decision tree, and Gaussian Naïve Bayes, was quantified. A two-way ANOVA was conducted to assess the ML accuracy, considering both the classifier type and the retinal layer as factors. Results A comparison of the accuracies achieved by various algorithms in classifying participant sex revealed superior results in datasets related to total retinal thickness and the retinal nerve fiber layer. In these instances, no significant differences in algorithm performance were observed (p > 0.05). Conversely, in other layers, a decrease in classification accuracy was noted as the layer moved outward in the retina. Here, the random forest (RF) algorithm demonstrated superior performance compared to the others (p < 0.05). Conclusion The current research highlights the distinctive potential of various retinal layers in sex classification. Different layers and ML algorithms yield distinct accuracies. The RF algorithm’s consistent superiority suggests its effectiveness in identifying sex-related features from a range of retinal layers.


Introduction
Over the past 30 years, optical coherence tomography (OCT) has been used as a non-invasive method for image acquisition to evaluate the anterior and posterior segments of the eye in both diseased and healthy conditions of the human retinal structure (1)(2)(3).The development of eye diseases can occur throughout life due to the natural aging process, exposure to unhealthy lifestyle habits, systemic disorders, or genetic inheritance.In addition to these factors, sex-related factors, such as the concentrations of sex hormones that vary throughout an individual's life, can also influence the development of eye diseases (4)(5)(6).
The existence of sexual dimorphism of the retina in humans has been investigated using OCT.The first findings of retinal sexual dimorphism pointed to a larger total retinal thickness in male subjects than in female subjects (7)(8)(9)(10)(11)(12).However, the debate regarding retinal layers remains open, as some studies have observed that some retinal layers are thicker in male subjects than in female subjects, while other investigations have found no or few sex-related differences (13)(14)(15)(16)(17)(18)(19).
Overall, investigating sex-related features in the human retina is an important area of research that could lead to new insights into the causes of retinal diseases, the development of sex-specific treatments, and the design of more effective medical devices for the eye and the possible impact of postmenopausal hormone replacement anti-estrogenic therapy therapy (20).
Due to the large amount of data extracted from the retina during an OCT scan, the use of machine learning methods could be an alternative candidate for analyzing OCT data.Machine learning methods have been used due to their ability to capture complex relationships, work with high-dimensional data, generalize to new data, be flexible and adaptable, and perform automated learning of relevant features, reducing the need for human intervention (21)(22)(23).Compared to norms based on populational averages, which may not account for the significant individual variability that exists within each sex group, machine learning models can capture and leverage this variability, allowing for more precise and individualized assessments of retinal thickness.This individualized precision can be particularly valuable in clinical decision-making as it takes into account the uniqueness of each patient's condition.Additionally, retinal thickness datasets can exhibit complex patterns and subtle variations that may not be fully captured by simple norm-based criteria.
In the present study, we aimed to evaluate the performance of several machine learning algorithms to predict the sex of the participants based on information from retinal structure features.Our primary goal was to identify which retinal layers are best to correctly classify the sex of the participant and which machine learning algorithms are better for predicting the participant's sex in the different retinal layers.

Ethical considerations
The present study was approved by the Ethical Committee for Research in Humans of the Universidade Federal do Pará (report number 3.285.557).All participants were informed about the experimental procedures and gave written consent to participate in the study.

Participants
The sample consisted of 26 male participants (mean age ± standard deviation: 26.19 ± 4.96 years) and 38 participants (mean age ± standard deviation: 26.05 ± 4.68 years).All participants had normal visual acuity or were corrected to 20/20 visual acuity using a refractive lens.Only two participants (one male and one female) used optical corrections of −0.5 and −0.7 diopters and we considered that any imprecision of their OCT measurements had little or no influence on the results.Participants with neurological, systemic, eye, or retinal diseases that affected the structure or function of the visual system were excluded.

OCT imaging
Retinal OCT imaging was performed using the Spectralis HRA + OCT system (Heidelberg Engineering GmbH, Heidelberg, Germany).Each session consisted of a 25-line horizontal raster scan in a 20 • ×20 • area centered on the fovea, followed by 24 automated real-time repetitions.The Heidelberg Eye Explorer software (Heidelberg Engineering GmbH, Heidelberg, Germany) was used to segment retinal layers [total retina (TR), retinal nerve fiber layer (RNFL), ganglion cell layer (GCL), inner plexiform layer (IPL), inner nuclear layer (INL), outer plexiform layer (OPL), outer nuclear layer (ONL), and retinal pigmented epithelium (RPE)] and three combinations of retinal layers [overall retinal, outer retinal layers (ORL), which range from the external limiting membrane to Bruch's membrane, and inner retinal layers (IRL), which range from the inner limiting membrane to the external limiting membrane].The thickness and volume of each layer were quantified.Visual inspection of the segmentation was performed to avoid possible errors.The outcome of the image segmentation of retinal layers was the mean thickness of nine macular subfields (central, nasal inner, temporal inner, superior inner, inferior inner, nasal outer, temporal outer, superior outer, and inferior outer), following the Early Treatment Diabetic Retinopathy Study (ETDRS) grid.The volume of each layer was also extracted.
For each participant, the examination was performed by the same operator following the manufacturer's guidelines.Two images were obtained in sequence for each eye on the same day.The first image was used as a reference to scan the same parts of the retina during the second image (device's follow-up mode).The thickness of both images was averaged for subsequent analysis.Data were acquired from 128 eyes with the Spectralis HRA + OCT system, and 64 eyes were randomly selected for analysis.

Machine learning algorithms
Prior to the application of ML algorithms, a bootstrap resampling method was employed, utilizing 200 replications for each feature derived from OCT readings.A total of 10 features were used for each retinal layer, comprising nine subfield thicknesses and the volume of the retinal layer.Python scripts were utilized for data analysis and normalization, feature selection, and the execution of ML algorithms through the training and testing phases.The performance of the ML was subsequently evaluated.
We utilized the StandardScaler function from the sklearn.Preprocessing package to standardize the features into standard deviation units, as shown in Equation 1.

Standardized_feature
= (feature-mean)/standard_deviation (Equation 1) The standardized features were used to train and test seven supervised ML algorithms: The sklearn.neighbors.KNeighborsClassifier function was employed to implement the k-nearest neighbors (kNN) algorithm, utilizing the Minkowski distance and a k-value within the range of 5-10.The optimal k-value, which yielded the highest accuracy, was determined using the GridSearchCV function.
The support vector classifier (SVC) utilizes sklearn.svm.SVC function with the radial basis function kernel.The gamma and C parameters are set to 1 and 10, respectively.
The sklearn.ensemble.RandomForestClassifier function is utilized in the application of random forest (RF), with the parameters set as follows: "criterion" is set to "gini impurity, " "n_estimators" is set to 50, and "max_depth" is set to 6.
The accuracy of ML algorithms in correctly classifying the data was evaluated (Equation 2).Accuracy = (true positives + true negatives)/total (Equation 2) True positives represent the data points correctly classified as male, while true negatives denote those accurately identified as female.The total refers to the overall number of data points.
The ShuffleSplit function from the Scikit-learn library (version 0.21.3) was utilized to divide the data, allocating 70% for model training and 30% for model testing.

Statistics
We used a t-test to compare the thickness of the different datasets obtained from both eyes of male and female subjects and to later carry out an intergroup comparison of retinal layer thickness.We conducted a one-way ANOVA to evaluate the influence of macular field in the retinal thickness as well as twoway ANOVA to evaluate the influence of the classifier type and retinal dataset factors on the accuracies (model training and model testing) of the classifier.For multiple comparisons, we employed the Tukey HSD post-hoc test.We compared the accuracies of the model training and model testing using a t-test for repeated measures.A confidence level of 5% was applied for the statistical comparisons.

Inter-eye comparison of the retinal thickness for male and female subjects
To ensure that the selection of the eye did not introduce any bias, we conducted a comparison of the thickness of various retinal layers between the right and left eyes of participants of both sexes.Our analysis revealed that no significant differences were observed in any of the retinal layers between the eyes.Based on these findings, we opted to randomly select one eye from each participant for data extraction concerning retinal thickness.Table 1 displays the comparison of retinal thickness in the various datasets obtained from both eyes within the sample.
We randomly select one eye to extract retinal thickness and compared this feature between male and female groups, as depicted in Table 2. Our findings indicated significant differences in the total retina and layers comprising information from the inner retina (RNFL, GCL, IPL, INL), with the male group exhibiting greater thickness compared to the female group (p < 0.01).Conversely, no significant differences were discerned in the layers within the outer retina (OPL, ONL, RPE; p > 0.01).
In the intergroup comparison, considering the thickness of different macular fields (Table 3), we observed that in datasets representing the total retina and data from the inner retina, the male group had thicker tissues across all fields than the female group (p < 0.01).However, in the datasets from the outer retina, we observed a predominance of non-significant differences.

Machine learning accuracies during model training
Table 4 presents the mean accuracies (± standard deviation) derived from model training across various classifiers and retinal datasets.The results of a two-way ANOVA revealed significant effects attributed to both the algorithm factor, the retinal dataset factor, and the interaction between these two factors, as summarized in Table 5. Notably, post hoc multiple comparisons demonstrated that the accuracies achieved by all algorithms were markedly superior when utilizing the total retina dataset and datasets originating from the inner retina (RNFL, GCL, IPL, INL), as compared to datasets from the outer retina (OPL, ONL, and RPE).
In evaluating the accuracies of different algorithms across the diverse retinal datasets, multiple comparisons indicated a notable absence of significant differences in algorithm performance within the total retina dataset and the inner retina datasets (p > 0.05).Conversely, in the OPL dataset, it was evident that random forest (RF), support vector classifier (SVC), and decision tree (DT) exhibited significantly higher accuracies when contrasted with other algorithms.Similarly, in the ONL dataset, random forest and decision tree outperformed their counterparts.Notably, in the RPE dataset, random forest demonstrated the highest accuracy among all algorithms.

Machine learning accuracies during model testing
Table 6 displays the mean accuracies (± standard deviation) derived from model testing across various classifiers and retinal datasets.Once again, the results of a two-way ANOVA revealed significant effects associated with the algorithm factor, the retinal dataset factor, and their interaction (as summarized in Table 7).Post hoc multiple comparisons further substantiated that, much like the training model, all algorithms achieved significantly higher accuracy levels when employing the total retina dataset and datasets from the inner retina, in comparison to the datasets from the outer retina.Consistent with the training model, the results of multiple comparisons within the total retina dataset and datasets from the inner retina indicated an absence of significant differences in algorithm accuracies (p > 0.05).In contrast, concerning the outer retina, random forest (RF) exhibited notably higher accuracy compared to other algorithms (p < 0.05).

Comparison of the accuracies estimated for the models in the training and testing stages
The comparison of the accuracies calculated for the models in the training and testing showed that 10.8% of the comparisons had significant differences, and all of them showed higher accuracy of the model in the training (Figure 1).
After finding that the random forest classifier outperformed other methods in classifying the datasets, we examined feature importance scores, which indicate the extent to which each feature influences the model's predictions.Random forest employs the Gini impurity, which reveals how frequently a feature is used to split the data in its decision trees.Figure 2 displays the feature importance scores for macular thickness in different fields.We conducted oneway ANOVA to assess the impact of the macular field on the feature importance score for each dataset.We found that in all datasets, there were significant differences (p < 0.01), with one or more fields having a greater importance than others in the classification decision.

Discussion
This study's findings reveal significant patterns in the classification accuracy of sex-specific data, utilizing various retinal layers and ML algorithms.The most reliable accuracies for accurately distinguishing between male and female participants were observed when analyzing data from the total retinal structure and the retinal nerve fiber layer.These results suggest that these retinal layers possess unique sex-related characteristics that were effectively identified by the employed ML techniques.Interestingly, the highest classification accuracies were consistently achieved using these retinal layers, yet no statistically significant differences were detected among the accuracies derived from the various ML algorithms used in this study.This suggests that the algorithms consistently performed when tasked with sex classification based on retinal data, regardless of their inherent methodologies.Moreover, a fascinating trend was observed where classification accuracies showed a decreasing trend as the analysis moved toward the outer retinal layers.Additionally, some algorithms demonstrated statistically significant deviations from others in terms of classification accuracy.Notably, the RF algorithms displayed higher accuracies compared to the others in this context.
While the sex of a patient is typically known during a consultation, it is not always evident whether the retinal thickness of that patient aligns with the sex-based patterns expected.Comparing a patient's retinal thickness to sex-based populational norms can be a valuable tool in evaluating the patient.However, alternative approaches, such as machine learning, can complement conventional statistical methods.For instance, our study revealed   that, even in retinal layers where there were no significant differences in thickness between the male and female groups, such as the datasets from the outer retina, we achieved a sex classification accuracy exceeding 75%.What would it signify if a male patient were classified as female based on retinal thickness patterns, or vice versa?It is crucial to emphasize that this classification does not pertain to the patient's actual sex but rather reflects the retinal thickness patterns expected for each sex.The clinical implications of a disparity between a patient's actual sex and a different sex classification based on retinal structure remain unclear, but further investigations may shed light on this question.
An investigation has previously been conducted using a deep learning method to predict sex through macular OCT images (24).It showed that the differences between male and female subjects might not be uniform throughout the macula.The best accuracy in separating data from male and female subjects occurred in the central fovea (around 75%) and lower accuracy was found in the external limit of the fovea (around 70%).They also fed models considering different macular sectors and found non-uniformity in the accuracies (ranging between 52 and 62%).The data they used are comparable to the total retina dataset of the present study.We interpreted that our accuracies were higher because we had fed our models with thickness information of all the macular sectors, and they used information from each sector for their classification.Taking into account the significance of macular field thickness, our results align with the findings achieved using deep learning approaches for the total retinal dataset, wherein the temporal fields were identified as the most crucial for classifying sex.The current study also revealed that in other retinal layers, the field of greatest importance varied.The difference between the accuracies of the training and testing models is a crucial aspect in the evaluation of machine learning models.This difference can provide insights into how well the model is generalizing to unseen data, which is essential for determining the model's robustness.In the current study, the vast majority of comparisons showed no significant discrepancy between the training and testing accuracies, which is a positive indication.It suggests that the model, which fits the training data well, also exhibits good generalization to new data.This alignment between training and testing accuracies indicates that the model is not overfitting the training data and has the potential for reliable performance on new, unseen data.
The superior performance of random forest in achieving higher accuracies compared to alternative machine learning algorithms in our study can be attributed to several key advantages of this ensemble learning technique.random forest harnesses the power of multiple decision trees, where each tree is trained on a different subset of the data and with feature randomness (25).This inherent diversity and randomness help mitigate overfitting, a common challenge in machine learning, by reducing the model's sensitivity to noise and outliers (26).Moreover, random forest's ability to handle both classification and regression tasks, its capacity to capture complex non-linear relationships in the data, and its robustness to multicollinearity make it particularly well-suited for a wide range of datasets (27).Additionally, the ensemble nature of random forest allows it to aggregate the predictions from multiple trees, reducing the risk of bias that can be associated with individual models.Consequently, the comprehensive nature of random forest, combining predictive power and robustness, positions it as an attractive choice for achieving high accuracy in diverse machine learning tasks.
Prior research has suggested that male participants typically display a greater retinal thickness compared to female participants (7)(8)(9)(10)(11)(12).The impact of sex on retinal layers is still a topic of ongoing debate.Some studies (13-17) have reported thicker retinal layers in male subjects (GCL, IPL, INL, OPL, and ONL), while others have observed minimal or no sex-related differences (18,19).Some studies have shown that female subjects had a thicker peripapillary RNFL than male subjects (28, 29).The present study uncovers a greater thickness in the inner retinal layers of male subjects compared to female subjects.Sexual hormones interacting with receptors such as estrogen and androgen receptors can affect ocular tissue.However, despite their influence on various ocular structures, the effect of these hormones on retinal layer thickness remains largely uninvestigated (30)(31)(32)(33)(34)(35).
Neglecting to account for sex differences in comparisons of retinal thickness between healthy individuals and patients could result in erroneous diagnoses, particularly for inner retinal  diseases that display substantial sex-related disparities.Conditions like glaucoma, macular holes, diabetic retinopathy, and agerelated macular degeneration demonstrate varying prevalence rates between male and female subjects.This is likely attributable to changes in sex hormone concentrations after the age of 50 (36,37).
The current investigation focuses on recruiting predominantly young adult participants, and as a result, the applicability of our findings may be limited to this specific age group.This demographic constraint represents a notable limitation of our study.To enhance the generalizability and robustness of our 10.3389/fmed.2023.1275308conclusions, it is imperative for future research endeavors to encompass a broader spectrum of cases, incorporating individuals from various age ranges.In the present study, our primary aim was to demonstrate that various models can learn pertinent sexrelated patterns within diverse retinal datasets.While the current sample size has proven adequate for this initial validation, it remains a limitation of the study and should be expanded in future research endeavors.
In conclusion, this research highlights the discriminative capacity of different retinal layers in sex classification, achieving varying levels of accuracy across distinct layers and ML algorithms.The consistently superior performance of the RF algorithm indicates its effectiveness in identifying sex-related characteristics in various retinal layers.Furthermore, the identified patterns of accuracy fluctuations across retinal layers offer invaluable insights for subsequent research and algorithmic advancement in the field of retinal data analysis. 10.3389/fmed.2023.1275308

FIGURE 1
FIGURE 1 Comparison of the algorithm accuracies calculated in the model training and model testing in the different retinal datasets.* p < 0.05.

FIGURE 2
FIGURE 2 Comparison of the feature importance score obtained from random forest algorithm to classify the sex of the participant based on retinal thickness from different datasets.The color code is indicated at the bottom of the figure.* p < 0.05.

TABLE 1
Comparison of the retinal thickness obtained from both eyes of male and female groups.

TABLE 3
Comparison of retinal dataset thickness in the different macular fields from measurements obtained from both groups.

TABLE 4
Comparison of mean accuracies (± standard deviation) obtained from the machine learning algorithms to classify the sex-related differences in the retinal layers (and total retina) for model training.TR, total retina; RNFL, retinal nerve fiber layer; GCL, ganglion cell layer; IPL, inner plexiform layer; INL, inner nuclear layer; OPL, outer plexiform layer; ONL, outer nuclear layer; RPE, retinal pigmented epithelium; SVC, support vector classification; GNB, Gaussian Naïve Bayes; RF, random forest; kNN, k-nearest neighbors; LR, logistic regression; LD, linear discriminant; DT, decision tree.

TABLE 5
Two-way ANOVA results for model training.
SS, sum of squares; DF, degrees of freedom; MS, mean squares.

TABLE 6
Comparison of mean accuracies (± standard deviation) obtained from the machine learning algorithms to classify the sex-related differences in the retinal layers (and total retina) for model testing.

TABLE 7
Two-way ANOVA results for model testing.