Automatic Artery/Vein Classification Using a Vessel-Constraint Network for Multicenter Fundus Images

Retinal blood vessel morphological abnormalities are generally associated with cardiovascular, cerebrovascular, and systemic diseases, automatic artery/vein (A/V) classification is particularly important for medical image analysis and clinical decision making. However, the current method still has some limitations in A/V classification, especially the blood vessel edge and end error problems caused by the single scale and the blurred boundary of the A/V. To alleviate these problems, in this work, we propose a vessel-constraint network (VC-Net) that utilizes the information of vessel distribution and edge to enhance A/V classification, which is a high-precision A/V classification model based on data fusion. Particularly, the VC-Net introduces a vessel-constraint (VC) module that combines local and global vessel information to generate a weight map to constrain the A/V features, which suppresses the background-prone features and enhances the edge and end features of blood vessels. In addition, the VC-Net employs a multiscale feature (MSF) module to extract blood vessel information with different scales to improve the feature extraction capability and robustness of the model. And the VC-Net can get vessel segmentation results simultaneously. The proposed method is tested on publicly available fundus image datasets with different scales, namely, DRIVE, LES, and HRF, and validated on two newly created multicenter datasets: Tongren and Kailuan. We achieve a balance accuracy of 0.9554 and F1 scores of 0.7616 and 0.7971 for the arteries and veins, respectively, on the DRIVE dataset. The experimental results prove that the proposed model achieves competitive performance in A/V classification and vessel segmentation tasks compared with state-of-the-art methods. Finally, we test the Kailuan dataset with other trained fusion datasets, the results also show good robustness. To promote research in this area, the Tongren dataset and source code will be made publicly available. The dataset and code will be made available at https://github.com/huawang123/VC-Net.


INTRODUCTION
Retinal blood vessels have attracted widespread research efforts as these vessels represent the only internal human vascular structures that can be observed noninvasively. Retinal vessel abnormalities reflect the cumulative damage caused by chronic diseases such diabetes and hypertension and represent an important risk indicator for many systemic and cardiovascular diseases (Wong et al., 2004). And the artery/vein (A/V) may be affected differently by variations in disease types and progression. For example, artery narrowing is mostly associated with arterial hypertension, whereas vein widening is related to increased brain pressure, stroke, and similar cardiovascular diseases. Hence, accurate image-based analysis and evaluation methods for the morphological evaluation of A/V changes might give an early insight and a deeper understanding of the pathophysiology of such diseases. The A/V caliber ratio (Wong et al., 2004) has been used as a predictor for cardiovascular diseases. Current clinical methods for retinal vessel segmentation and A/V classification mainly rely on manual segmentation. However, due to the high complexity and diversity of vessel structures, manual segmentation brings inevitable shortcomings, including being time-consuming and laborious, having interrater variability and subjectivity, and having lower efficiency and accuracy. Thus, automatic methods for A/V classification and vessel segmentation are highly desirable in clinical settings. The advantages and disadvantages of current clinical methods and automatic methods are shown in Figure 1.
In recent years, several automated techniques have been proposed for retinal A/V classification (Ishikawa et al., 2005;Fraz et al., 2012a;Orlando et al., 2017). These techniques may be categorized into graph-based (Dashtbozorg et al., 2014;Joshi et al., 2014;Estrada et al., 2015;Hu et al., 2015;Pellegrini et al., 2018;Srinidhi et al., 2019) and feature-based (Niemeijer et al., 2009;Zamperini et al., 2012;Mirsharif et al., 2013;Xu et al., 2017;Huang et al., 2018a,b) techniques. Yet, in graph-based approaches, difficulties may be encountered when some vascular regions cannot be segmented, and hence, vessel segments cannot be reliably linked (Welikala et al., 2017). Besides, for featurebased techniques, most recent studies use a two-stage approach for retinal A/V classification. Vessels are firstly segmented from the background; next, the segmented vessels are categorized into arteries and veins by using purely handcrafted features in feature-based methods or by merging edge information in graphbased methods. However, the two-stage approach suffers from the heavy dependence of the A/V classification outcomes on vessel segmentation accuracy. In fact, if the accuracy of blood vessel segmentation is low in the first stage, the A/V classification results will not be good either in the second stage.
With the development of deep learning, many convolutional neural network-based methods have been proposed for joint vessel segmentation and A/V classification. Xu et al. (2018) adopted an improved fully convolutional network (FCN) architecture to segment retinal arteries and veins simultaneously. This method enabled end-to-end multilabel segmentation of color fundus images. AlBadawi and Fraz (2018) proposed an FCN architecture with an encoder-decoder structure for pixel-based A/V categorization. Meyer et al. (2018) also adopted the FCN architecture for A/V classification and demonstrated high performance on major vessels with thicknesses of more than three pixels. Hemelings et al. (2019) proposed a novel FCN-based U-Net architecture for simultaneous blood vessel semantic segmentation and A/V discrimination. Ma et al. (2019) proposed an enhanced deep architecture with a spatial activation mechanism for joint vessel segmentation and A/V identification. Li et al. (2020) made a highly confident prediction about the peripheral vessels by taking the structural information among vessels into account with post-processing.
However, automatic vessel segmentation and A/V classification are still considered difficult tasks due to the following challenges: (1) The multiscale structure of blood vessels is easily overlooked. These methods focus on large-scale structures such as thick blood vessels, but the performance is poor for small-scale structures such as the edge and the end of thick blood vessels, as shown in Figure 2D.
(2) There is extreme imbalance between positive samples (blood vessels) and negative samples (non-vessel areas) in retinal fundus images, where blood vessels account for only 15% of the whole image. Correspondingly, the proportion of arteries and veins is only about 7.5% each. As a result, directly classifying the pixels of the retinal image as background, artery, and vein pixels is very challenging. (3) Distinguishing between arteries and veins can be highly confusing. The results of the aforementioned methods still show poor localization performance between arteries and veins; for example, the same blood vessel may be half recognized as an artery and half as a vein, as shown in Figure 2B. (4) The choroid is similar to blood vessels and is easy to misclassify.
Besides, most of these existing methods are only validated on specific datasets. However, in clinical applications, the performance would underperform when tested on datasets with a different image resolution, imaging equipment, and population. For example, when generalizing a trained model to datasets with different center scales, the performance of the model usually deteriorates. The characteristic differences of retinal fundus images among different scales will also influence the segmentation results. One possible solution to this problem is labeling some samples of the new dataset to fine-tune the pretrained model, but this process is expensive and time-consuming.
In order to alleviate these challenges, in this work, we introduce a novel convolutional neural network for joint A/V classification and vessel segmentation in retinal fundus images, named the vessel-constraint network (VC-Net). Firstly, in order to alleviate challenge (1), the VC-Net employs a vessel-constraint (VC) module to enhance the microvessels and the edge of thick vessels by using Gaussian kernel function probability maps to enhance the feature weights of the blood vessel edge area. And the multiscale feature (MSF) module is proposed to extract and express blood vessel features at different scales in the encoder. Secondly, in order to alleviate challenges (2)-(4), the VC module combines the global and local vessel information to generate a weight map to constrain the A/V features, which suppresses the background features. Not only can this alleviate the imbalance of positive and negative samples, but this also pays more attention to the features of arteries and veins to achieve better A/V classification performance.
The key contributions of this study can be highlighted as follows: • For the first time, we propose a VC-Net that uses vessel probability information to constrain A/V and enhance learning of discriminative A/V features. In addition, the VC-Net can also get blood vessel segmentation results simultaneously. • The newly designed VC module is powerful in A/V feature extraction. The VC module is used to capture the distribution information of vessels as a weight to constrain the A/V features, which suppresses background-prone features to pay more attention to vessel features. Data fusion (DF) alleviates well the problem of imbalance between positive and negative samples and helps us learn more discriminative A/V features. At the same time, the VC module enhances the microvessels and the edge of thick vessels by using Gaussian kernel function probability maps to improve the feature weights of the blood vessel edge area. • The MSF module of multiscale DF is proposed to extract and express blood vessel features at different scales in the encoder, where the diameters of the main vessels and microvessels vary greatly. The DF training strategy is applied to improve the robustness of the model by fusing information from datasets with different scales. • We publicly released the Tongren dataset with ground truth annotation. The lack of retinal fundus image data with annotated label impedes further exploration of retinal vesselrelated researches such as vessel segmentation and A/V classification in the deep-learning community. Therefore, we established a dataset to promote these studies with a detailed data description in the experimental setup section of this paper.
The rest of this paper is organized as follows. Firstly, we present the details of our proposed methodology in Section "Materials and Methods". Then, the descriptions of the datasets and the experimental details are described in Section "Experimental Setup". Next, our experimental results are presented in Section "Results". Finally, the discussions and conclusions follow in Section "Discussion and Conclusion."

MATERIALS AND METHODS
The design details of the VC-Net are shown in Figure 3. Firstly, we propose a VC module to capture the DF feature of the distribution and edge information of the vessel and enhance the microvessel and the edge of thick vessels. Then the distribution and enhanced information are utilized as weights to activate the A/V features and enforce the A/V classification module to focus more on vessels and help us learn more discriminative A/V features, for extremely unbalanced vessel and background. In addition, we used the MSF module to extract blood vessel features at different scales for varied diameters of the main vessels and microvessels.

VC Module
In a retinal fundus image, blood vessels typically account for about 15% of the full image. Consequently, the area proportion of the arteries and veins is only about 7.5% each. Hence, directly classifying the retinal image pixels into background, artery, vein, and undecided pixels is a significant challenge task due to the high-class imbalance and the scarcity of training samples. To alleviate this problem, we designed a VC module at the end of the framework to enhance A/V classification.
The VC module combines the local and global vessel information to generate a weight map to constrain the A/V features, which suppresses the background-prone features to pay more attention to vessel features. In this way, it can alleviate the problem of severe imbalance between positive and negative samples. At the same time, we introduced Gaussian kernel function probability maps to improve feature weights of microvessels and the blood vessel edge area, thereby enhancing the feature representation of microvessels and the edge of thick vessels. The Gaussian activation function of the VC module is defined as Where x belongs to the probability map of the whole blood vessel segmentation, with values between 0 and 1, and α is a fixed parameter (set to 1 in this experiment). The function F(x) further focuses on local vessel information, such as vascular boundaries and microvascular areas. Based on experimental observations and earlier studies, the probability of misclassifying vessel pixels is essentially concentrated around 0.5. These misclassified pixels come either from the vesselbackground boundary or the microvascular areas whose features are not obvious and difficult to distinguish from the background. The background and thick vessel pixels have a value near 0 or 1.
Through the function F(x), the activation weight value of a pixel with a probability close to 0.5 was increased to [α(1e −0.5 ) + 1], while the activation weight values of the background and main thick vessels were set close to 1. The activation function constrains the activation weight value to be within [1, α(1e −0.5 ) + 1]. Note that F(0.5 + x 1 ), x 1 ∈ [0,0.5].

Multiscale Feature
As shown in Figure 4, the scale of blood vessels varies greatly in retinal fundus images. On the one hand, the average artery diameter is generally slightly smaller than the average vein diameter. On the other hand, the average diameter for the main blood vessels is much larger than that of the capillaries. Therefore, we use the capabilities of the pre-trained Res2Net  model to learn and understand the retinal vessel image features at different scales in the encoder stage. Instead of extracting features using 3 × 3 filter groups as in the ResNet (He et al., 2016) bottleneck block (Figure 5A), smaller filter groups connected in a hierarchical residual-type manner are used ( Figure 5B). After the 1 × 1 convolutional stage, the features are split into k subsets, where the ith subset is denoted by x i , where i ∈ {1, 2,. . ., k}. While all subsets have the same spatial size, the channel count for each subset is 1/k times that of the input feature map. Each subset x i (except for x 1 ) has a 3 × 3 convolutional filter F i (). Thus, the filter output y i can be written as Each 3 × 3 convolutional operator F i () might get information from all feature subsets {x j , j ≤ i}. When a feature subset x j is processed by a 3 × 3 convolutional operator, the output result may have an enlarged receptive field compared to x j . Here, the scale dimension k is used as a control parameter. A larger k value enables learning features with larger receptive field sizes, with insignificant computation and memory overheads due to concatenation.

Loss Function
We employ an end-to-end deep-learning scheme as our underlying framework. The A/V loss is quantified by the  commonly used cross-entropy loss function While the vessel segmentation loss is quantified by the binary cross-entropy Where n denotes the number of pixels in the input image, y' is the predicted output probability of a foreground pixel, y is the ground-truth pixel label, and c denotes the cth class of the output. The total loss is defined as Where γ = 0.6, δ = 0.4, and δ + γ = 1. We use L 2 regularization with a weight of β = 0.0002.

EXPERIMENTAL SETUP
In this section, we describe the used retinal image datasets, the evaluation metrics for retinal vessel segmentation and A/V classification, and the VC-Net training details.

Datasets
In this work, we evaluated our approach and assessed its clinical applicability on five retinal fundus image datasets of different scales. Three datasets are publicly available while the other two were collected by authors. In order to validate the generalization performance and robustness of our method by DF experiments, we specifically annotated two multiscale datasets. An overview of these datasets is given in Table 1.

DRIVE
Our model was firstly trained and tested on the publicly available DRIVE database (Hu et al., 2013). This database contains 40 color retinal fundus images with image dimensions of 584 × 565 pixels. These images were evenly divided into training and test sets with 20 images in each set. Pixel-wise labeling is provided for vessel segmentation and A/V classification.

LES
The LES dataset (Orlando et al., 2018) contains 22 images with a 30 • field of view (FOV) and a resolution of 1,444 × 1,620 pixels for 21 images and a 45 • FOV and a resolution of 1,958 × 2,196 pixels for one image. The images are equally divided into training and test sets with 11 images in each set.

HRF
The HRF dataset (Odstrcilik et al., 2013) contains 45 images equally divided among three categories, namely, healthy subjects, patients with diabetic retinopathy, and patients with glaucoma. Images were captured with an FOV of 60 • and a pixel resolution of 3,304 × 2,336. Only one ground-truth segmentation map is available for each image. For each category, five images are used for training and the rest are used for testing.

Tongren
The Tongren clinical dataset contains 30 representative retinal fundus images with a 45 • FOV and a resolution of 1,888 × 2,816 pixels, within which 20 images were normal and 10 images were of moderate cataract or retinal diseases including glaucoma, age-related macular degeneration, and retinal vein occlusion. An approval was obtained from the Ethics Committee of Beijing Tongren Hospital. The ocular fundus had been taken with a fundus camera (CR6-45NM camera, Canon Inc., Ota, Tokyo, Japan). These images were labeled by two experienced ophthalmologists with the ITK-SNAP toolkit (Yushkevich et al., 2006). For each category, half of the images are used for training, and the rest are used for testing.

Kailuan
The Kailuan database contains 30 images which were collected from participants of the community-based Kailuan Cohort Study (Jiang et al., 2015). These images have different sizes. The minimum, average, and maximum heights are 1,588, 1,902, and 2,112. The minimum, average, and maximum widths are 1,586, 1,901, and 2,112. We used 15 images for training and the rest for testing. Also, these images were labeled by experienced ophthalmologists with the ITK-SNAP toolkit (Yushkevich et al., 2006). The binary ground-truth segmentation maps for the DRIVE, LES, and HRF images are publicly available. For the Tongren and Kailuan images, we have manually created FOV masks using methods similar to those of Soares et al. (2006), Figure 6 shows samples of Tongren and Kailuan datasets.

Evaluation Metrics
The retinal vessel segmentation outcomes of the proposed method were compared against those of other reference methods using several metrics, namely, sensitivity (SE), specificity (SP), accuracy (ACC), the area under the ROC curve (AUC), and the F1 score (F1). The binary segmentation maps were generated through thresholding the probability maps with a 0.5 threshold.
For A/V classification, five performance evaluation metrics were adopted. We interpret arteries as positives and veins as negatives. The A/V sensitivity (SE AV ) and specificity (SP AV ) reflect the model capability for correctly detecting arteries and veins, respectively. The balance accuracy (BACC) quantifies the overall performance of the model. These metrics are defined as follows.
Where TP is the count of the correctly classified artery pixels, TN is the count of the correctly classified vein pixels, FP is the count of the vein pixels misclassified as artery pixels, and FN is the count of the artery pixels misclassified as vein pixels. In addition, we compute the F1 score for arteries (F1 A ) and the F1 score for veins (F1 V ) when arteries and veins represent the relevant samples, respectively. The optimal value for each of these metrics is 1. Computations were restricted to pixels within the FOV.

Network Training Details
Few training samples are available in each of the five databases and are hence insufficient for handling model complexity. To alleviate this problem, several data augmentation strategies (Fraz et al., 2012b;Maninis et al., 2016;Feng et al., 2017;Oliveira et al., 2018;Guo et al., 2019) have been explored, including image scaling with different scale factors and image rotation by different angels. As no prior knowledge is available on the appropriate patch size selection, patches with a size of 512 × 512 were randomly picked from the retinal images and used for network training. For each test image, ordered patches were collected, and the final segmentation and classification outcomes were found by stitching together the associated patch predictions. A stochastic gradient descent algorithm with momentum was employed for optimizing model parameters with a maximum of 4,000 iterations. The learning rate was initially set to 0.001 and then cut in half every 1,500 iterations. Method implementations were carried out using a PyTorch backend the NVIDIA CUDA R Deep Neural Network library (cuDNN 9.0), and an Intel R Xeon R Gold 6148 CPU with a processor of 2.40 GHz, a RAM of 256 GB, and an Ubuntu 16.04 operating system.

RESULTS
In this section, we introduce the results of the experiment. Firstly, we conduct a series of ablation studies to systematically analyze the effectiveness of each component of the proposed network and its impacts on overall segmentation performance. Then, we apply our method to the aforementioned datasets and compare it with state-of-the-art methods. Finally, we verify the effectiveness of the DF strategy to address the challenges in new datasets.

Ablation Studies
Detailed ablation studies have been conducted to evaluate the contribution of each module of the proposed VC-Net architecture. These modules include the basic U-Net module, the MSF in the encoder, and the VC module for A/V classification. In Table 2, the first two methods apply direct recognition of retinal fundus images into background, artery, vein, and undecided pixels. Based on the recognition results, vessel segmentation indicators are calculated. The proposed method was used for vessel segmentation and A/V classification simultaneously; performance indices were calculated accordingly.
As shown in Table 2, the A/V classification results have been significantly improved with the addition of MSF. The MSF can extract and express the vessel features with different scales in the encoder to solve the varying diameters of the main vessels and microvessels. Remarkably, the blood vessel classification performance has been further improved to a certain extent by using the VC module; our results show that we achieved 0.9483, 0.9327, 0.9547, 0.7428, and 0.7880 on BACC, SE AV , SP AV , F1 A , and F1 V , respectively. The VC module can suppress backgroundprone features to pay more attention to vessel features; it alleviates well the problem of positive and negative sample imbalance and helps us learn more discriminative A/V features. At the same time, the VC module can enhance the feature representation of microvessels and the edge of thick vessels. More importantly, from Table 2, we can see that the combination of U-Net, MSF, and VC modules achieves the best results with a BACC of 0.9542, SE AV of 0.9351, SP AV of 0.9732, F1 A of 0.7605, and F1 V of 0.7971. Therefore, the ablation study demonstrates the effectiveness of the proposed modules. As shown in Figure 7, we visualized the A/V classification results for different modules of the proposed VC-Net architecture. In particular, results for four regions of interest were highlighted and magnified. We can see that A/V classification results of the U-Net are poor, where arteries and veins are seriously confused, and that there are many misclassifications at the edges and ends of blood vessels. With the introduction of MSF, the A/V classification results have been improved, but there is still the problem of arteries and veins being confused near the crossing and branching points of blood vessels. Obviously, in comparison with other models, we proposed the VC-Net as it achieves better A/V classification results both locally and globally. The above analysis proves that our model certainly improves the overall A/V classification performance.
As we can see, the VC-Net model outperformed other methods based on performance metrics and visualization results. In addition, we also explored the influence of varying the α parameter values on the VC-Net model performance. Specifically, we trained the model from scratch with different α values, ranging from 0.4 to 1.6. The results are shown in Table 3. For A/V classification, the BACC, SE AV , F1 A , and F1 V metrics increased with the decrease in α. Nevertheless, the increase between α = 0.4 and α = 0.6 was very small, and there was even a small decrease in F1 A and F1 V . For vessel segmentation, α approaches 1.0-1.4, and the indicators show good performance. Therefore, the α value should be adjusted according to different scenarios. If a larger SE AV value is desired, the α value can be appropriately reduced to train a model from scratch.
After training the VC-Net model, we varied the α values in the trained model to test the model performance on the test dataset. The results are shown in Table 4. Obviously, with the decrease of α, most indicators are increased except SE, F1, and F1 V . As a bonus and as SE is increased with α, the effectiveness of the VC model is verified from the side. When a model has been trained, if a larger indicator for A/V classification is needed, α can be appropriately reduced. And if a lager SE is needed, α can be appropriately increased.

Comparison With Existing Methods on the DRIVE Dataset
We compared the VC-Net performance with that of other stateof-the-art methods on the DRIVE dataset for vessel segmentation and A/V classification tasks. Table 5 summarizes the vessel segmentation comparison results. As seen, the proposed VC-Net shows superior segmentation performance in terms of AUC  and F1. In Table 6, the existing methods are evaluated for the classification performance on the segmented vessels only.
On the contrary, we evaluated the VC-Net performance on all A/V ground-truth pixels. This evaluation is more challenging than that on the segmented vessels, since the identification of major vessels is an easier task if the capillary vessels are not segmented. The comparison with existing methods under the same criteria shows superior performance of our model, which achieves a BACC of 0.9554, SE AV of 0.9360, SP AV of 0.9748, F1 A of 0.7616, and F1 V of 0.7964. Indeed, our model surpasses the current best A/V classification method due to the introduction of the VC module.  In particular, for Table 6, it is noteworthy that VC-Net has outperformed existing methods in terms of all metrics in identifying arteries and veins. This performance superiority is mainly due to the fact that the vessel activation map not only enhanced the vascular boundaries and microvessels but also strengthened the main thick vessels, suppressed the background, and hence enabled the model to learn more vessel features. Besides, the vessel activation map eliminated the imbalance between the background and the blood vessel samples to a certain extent.

Comparison With Existing Methods on Other Datasets
The proposed VC-Net was also compared with existing methods on two other public datasets and two collected datasets. For vessel segmentation, the results on the LES and HRF public datasets are shown in Table 7. Clearly, VC-Net achieved significantly better    Table 8. It can be seen that all indicators have been significantly improved compared to those in UA_VA (Galdran et al., 2019) on the LES dataset. In particular, the BACC, SE AV , and SP AV metrics increased by 9.84, 7.10, and 11.38%, respectively. Moreover, the VC-Net also showed excellent performance on the HRF dataset with a BACC of 0.9646. The above results once again demonstrate the excellent performance of VC-Net.
In addition, we tested the VC-Net performance for blood vessel segmentation and A/V classification on the two collected Tongren and Kailuan datasets. The results are shown in Table 9. For the Tongren dataset, there were significant improvements compared with the previous methods. Specifically, the ACC, BACC, F1 A , and F1 V metrics for VC-Net were improved by 0.39, 4.41, 6.43, and 7.44%, respectively, in comparison with the basic U-Net method. And the VC-Net achieved better results with an SP of 0.9767, F1 of 0.7974, F1 A of 0.7221, and F1 V of 0.7741 on the Kailuan dataset. The experimental results demonstrate that our method achieves competitive performance for A/V classification and vessel segmentation.

Segmentation Results of Challenging Images
Sample images from the above-mentioned five databases and the corresponding predicted and ground-truth vessel maps are

Evaluation Results on Unseen Datasets With Multiscale DF
Data fusion is a fundamental step to deal with the new data problem. To improve the robustness of the proposed model, DF from datasets with different scales could enrich the amount of training data and data distribution and could be validated on a new dataset with multiscale. We define this training strategy as DF training. The first three rows of Table 10 were only trained on the DRIVE, LES, or HRF datasets and tested on the Kailuan dataset. Finally, the three datasets combined and shuffled the images. The fused data are used as the training dataset and tested under the Kailuan dataset. It can be seen that the best results have been achieved on most indicators on the Kailuan dataset after DF. As a bonus, the DF training can enhance the robustness of the model, and it is more suitable for testing on new datasets.

DISCUSSION AND CONCLUSION
In this paper, we propose a VC network that utilizes information of vessel distribution and edge to enhance A/V classification. The proposed VC module combines local and global vessel information to generate a more reasonable weight map to constrain the A/V features, which suppresses the backgroundprone features and enhances the edge and end features of blood vessels. Meanwhile, we used an MSF module to obtain multiscale vessel features, such as the main thick vessels, vascular boundaries, and microvascular regions. Our method achieves better blood vessel segmentation and A/V classification performance. More importantly, we adopt the DF strategy to improve the robustness and generalization ability of the proposed model.
The VC-Net model demonstrates the effectiveness on multiscale and multicenter datasets. It outperforms existing methods and achieves state-of-the-art results for A/V classification and vessel segmentation on three public datasets. And the proposed model was tested on multicenter datasets: Tongren and Kailuan; the results indicate the superior generalization capability of the network. In addition, this model shows better performance on datasets with different resolutions. The visualized vessel maps reflect the importance of the MSF extraction module in our model and the excellent overall control of the global and detailed features by the VC module. In particular, to promote the development of this field, we collected two retinal fundus image datasets (Tongren and Kailuan), which labeled the arteries and veins with the ITK-SNAP toolkit, and we will be releasing the Tongren dataset.
One of the limitations of our work is that large-scale fundus images cannot be accommodated by the network; such images should be reduced to patches of a reasonable size to facilitate the training and testing processes. This patch-based approach distorts the global view of capillaries and large vessels. The other limitation is that computational resources are highly demanding. Therefore, we hope to use our work as a basis to further analyze the performance of vessel segmentation and A/V classification algorithms for large-scale fundus images and improve the utilization of computational resources.
In the future, we will deploy our algorithm to mobile terminals and develop an automatic retinal blood vessel analysis system, which is more conducive to clinicians' understanding and use of this algorithm and promotes the diagnosis of ophthalmology and systemic diseases.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.