- 1School of Civil Engineering and Transportation, Guangzhou University, Guangzhou, Guangdong, China
- 2Guangzhou Testing Center of Construction Quality and Safety Co., Ltd., Guangzhou, Guangdong, China
- 3School of Architecture and Civil Engineering, The University of Adelaide, Adelaide, SA, Australia
- 4Research Center for Wind Engineering and Engineering Vibration, Guangzhou University, Guangzhou, Guangdong, China
Introduction: This paper evaluates the robustness and generalization ability of five recently developed Convolutional Neural Networks (CNNs), Visual Geometry Group 16 (VGG16), Google Inception Net (GoogLeNet), Mobile Network version 3 Large (MobileNetV3-Large), Efficient Network B0 (EfficientNetB0) and Efficient Network version 2 Small (EfficientNetV2-S), on crack recognition and classification.
Methods: This study proposes a semantic segmentation based on VGG16- U-Net to address the issue of background noise in the images automatically and the transfer learning with fine-tuning is used to improve the performance of the CNNs in the bridge crack image dataset and building crack image dataset (transverse cracks, vertical cracks, oblique cracks and irregular cracks).
Results: The results indicate that the MobileNetV3-Large has the best performance. For the low-resolution building crack image dataset, the accuracy of the crack recognition reaches 99.58% and the F1-score reaches 99.60%. The accuracy of the classification reaches 94.70% and the Macro-F1 reaches 94.71%. For the higher resolution bridge crack image dataset, the accuracy of the classification reaches 95.70% and the Macro-F1 reaches 95.67%.
Discussion: The results show that the MobileNetV3-Large has the best robustness and generalization ability with a small CNN size and the shortest training time.
1 Introduction
Visualization and identification of the crack are important to evaluate the condition of structures (Cheng et al., 2019). For reinforced concrete structures, cracks could appear due to applied loading and environmental conditions, such as stress induced by the temperature difference and seismic action (Zhiguo et al., 2019). Cracks can reduce the safety of a structure by depriving reinforcement of protection (Ma, 2014) and increasing the possibility of reinforcement corrosion (Otieno et al., 2010). Regular or long-term monitoring and measurement of critical cracks can reveal the condition of the structures and evaluate structural safety (Zhang and Stang, 1998). Therefore, it is very important to develop a rapid and efficient technique for the identification and classification of cracks in reinforced concrete structures.
Crack recognition and classification is an important task for preventing further damage and ensuring the structural safety of civil infrastructure (Sharma et al., 2018). At present, the most commonly used method for crack identification is still manual inspection and highly subjective (Dais et al., 2021), which can cause misjudgment (Phares et al., 2004). In addition, manual inspection is labor intensive, low efficiency, etc.
The rapid developments of Convolutional Neural Networks (CNNs) and Graphics Processing Units (GPU) have promoted the application of CNNs in different fields. For instance, in the field of road engineering, Garbowski and Gajewski (2017) utilized 3D point cloud modeling for high-precision automated identification and quantitative assessment of pavement crack; while Nhat-Duc et al. (2018) employed a CNN-based model to achieve automated crack detection in asphalt pavements. In the domain of bridge structures, Yu et al. (2021) developed a rapid and high-accuracy detection method for bridge cracks by employing the YOLOv4-FPM model integrated with a pruning algorithm. In railway engineering, Aldao et al. (2023) utilized the DeepLab V3+ model coupled with an image segmentation algorithm to achieve automated identification, missing detection, and fastening condition assessment of railway track bolts and fasteners. For municipal drainage systems, Yin et al. (2020) applied the YOLOv3 object detection model to automatically locate and classify multiple types of pipeline defects. In underwater engineering, Fan et al. (2022) introduced an attention mechanism via the MA-AttUNet model, enabling high-precision automated identification and segmentation of underwater dam cracks. In recent years, CNNs have been widely adopted in crack recognition and classification. Different from the traditional approach, CNNs can automatically extract image features without manual interpretation and it can realize object detection and image classification (Kohlhepp, 2020; Zhao et al., 2019). Visual Geometry Group 16 (VGG16) (Silva and Lucena, 2018) was used to identify a dataset containing 2336 concrete crack images with 1164 non-crack images. The effect of different learning rates on crack detection was investigated and the best result of 92.27% overall accuracy was finally obtained. Yusof et al. (2018) binarized the crack images and trained the proposed CNN in this study, which achieved 98% of crack recognition accuracy and the crack classification accuracy of lateral cracks and vertical cracks was 98% and 97%, respectively. Shengyuan and Xuefeng (2019) applied the modified AlexNet to train the crack dataset without preprocessing and the results showed that the crack recognition accuracy reached 99.06%. Dais et al. (2021) used CNNs (VGG, ResNet, DenseNet, Inception, MobileNet) to identify cracks in masonry structures and train without image preprocessing. The results showed that MobileNet had the highest accuracy of 95.3%. Li et al. (2020) constructed CNNs with different receptive fields using convolutional, pooling, and fully connected layers. A road crack dataset (containing non-crack, transverse crack, longitudinal crack, block crack, and alligator crack) was trained. Song et al. (2019) established a multi-scale expanded convolution module and introduced an attention mechanism for training using a dataset containing lateral cracks, vertical cracks, massive cracks and crocodile cracks without pre-processing the images. The results showed that the classification accuracy of lateral and vertical cracks was above 95% and the classification accuracy of massive and crocodiles was above 86%. Liu et al. (2022) used deep learning (MobileNet, ResNet, DenseNet and EfficientNet) and infrared thermography to classify the severity of asphalt pavement cracks and established a dataset of asphalt pavement cracks, including non-cracks, low-severity cracks, medium-severity cracks and high-severity cracks. The results show that the above CNNs perform well on non-crack and low-severity crack, while the classification errors mostly occur on medium-crack and high crack images. Among them, EfficientNet-B3 obtains high accuracy for low-severity, medium-severity and high-severity cracks. The above study illustrates that different cracks are not consistently difficult to identify for the model. Therefore, it is necessary to carry out classification research for various crack.
In recent years, research on crack identification and classification has focused on deep learning. However, there are some key challenges: i) the influence of the environmental factors, which affects the robustness of the deep learning technology; ii) the data obtained from different scenarios, by which the generalization ability of CNNs becomes very important; and iii) the demand of online monitoring that the lightweight structure is required for the application of CNNs in a practical situation.
1.1 Frontier mainstream of convolutional neural network
At present, CNNs-based crack detection focuses on how CNNs realize crack recognition with high accuracy. However, for practical engineering, the robustness, generalization ability and lightweight of the CNNs also needs to be investigated. In this study, the following five representative CNNs are selected to study.
1. VGG16: In the early stage of this research, researchers believe that the depth of the CNNs affects the final performance. Thus, VGG16 (Simonyan and Zisserman, 2014) was proposed, which increases the depth of the CNN and uses a 3 × 3 convolution kernel to make the CNN more concise and efficient.
2. Google Inception Net (GoogLeNet): While a dramatic increase in network, it is challenging to avoid excessive calculation for improving the performance of CNNs. To solve this problem, GoogLeNet was (Szegedy et al., 2014) proposed based on the Inception module, and improve the performance of the CNN by using a dense matrix without significantly increasing computational cost.
3. Mobile Network version 3 Large (MobileNetV3-Large): Although GoogLeNet and VGG16 perform well in image classification, it is difficult to apply these CNNs to scenarios with a requirement of fast response and small memory footprints due to the large CNN sizes and the slow training. Therefore, MobileNetV3-Large (Howard et al., 2019) was proposed, which uses the Neural Result Search (NAS) algorithm and introduces a lightweight attention module with the Squeeze and Excitation (SE) (Jie et al., 2017) structure.
4. Efficient Network B0 (EfficientNetB0): Researchers needed to manually zoom in or out the images to adjust the CNNs’ depth, width and input image size. However, only a single dimension parameter of the CNNs can be adjusted, which is subject to the limitation of computing resources. Thus, the best dimension combination of the CNNs is difficult to find. Mingxing and Quoc (2019) believed that the CNNs’ dimensional parameters have an influence on each other and the CNNs’ dimensionality can be balanced by utilizing Neural Architecture Search (NAS) to obtain the optimal network structure.
5. Efficient Network version 2 Small (EfficientNetV2-S): In 2021, Tan and Le (2021) replaced the shallow MBConv of EfficientNet with Fused-MBConv and proposed EfficientNetV2-S. EfficientNetV2-S realized accelerated training using the data samples by modifying the progressive learning strategy and proposing the mechanism of adaptive adjustment of regularization parameters.
1.2 Image preprocessing and semantic segmentation
The quality of the crack image preprocessing directly affects the results of crack classification and recognition. To improve the crack classification and recognition performance, efficient image preprocessing is essential. Different from the traditional image preprocessing, the semantic segmentation (Linda and George, 2001) can learn and extract crack image features. Semantic segmentation is an important branch of image processing and computer vision. U-Net (Ronneberger et al., 2015) was developed based on the Fully Convolutional Network (FCN) to realize biomedical image segmentation, in which convolutional coding and convolutional decoding of this network are completely symmetric. The combination of low-level feature maps is constructed into high-level complex features to achieve precise positioning and solve the problem of image segmentation, which is an FCN with good scalability. Jenkins et al. (2018) used U-Net to semantically segment road crack images and found that the semantic segmentation accuracy of vertical crack images is lower than that of lateral cracks. Jacob et al. (2019) embedded the attention mechanism and residual convolution block module on U-Net for the first time and used the residual connection between convolution modules to semantically segment road cracks. According to Piao (2019), U-Net integrating with VGG16 (VGG16-U-Net), in which the encoder has the full advantage of VGG16 and U-Net, can address the under-segmentation phenomenon. Therefore, this study proposes to use VGG16-U-Net to remove image background noise, extract image crack features and achieve automatic image preprocessing.
1.3 Improvement of generalization ability using transfer learning
The generalization ability of the CNNs is also very important for crack identification and classification. Researches (Alipour et al., 2019; Zhang et al., 2019) showed that the generalization ability of the CNNs cannot be studied using a single dataset. Therefore, one of the objectives of this paper is to verify the generalization of CNNs through two different datasets.
The performance of the CNN relies on a large amount of labeled data. However, the acquisition and labeling of the dataset are very time-consuming. Transfer learning addresses the dependence issue that the CNN relies on the amount of labeled data. Transfer learning transfers the trained CNN parameters (Mohsen et al., 2020) to a new network for training, which allows the CNN to achieve higher accuracy and training speed (Tan et al., 2018). The current commonly used transfer learning technology is fine-tuning (Long et al., 2015). The CNN loads pre-training weights during the training process and freezes the weights except for the last convolutional layer and the fully connected layer. Finally, the weights of the convolutional layer and the fully connected layer are retrained with a new learning rate. Transfer learning has been widely used in the identification of structural damage (Mohsen, et al., 2020). Dung et al. (2019) used the Fine-tune and data augmentation on VGG16 and performed fatigue crack detection at steel bridge nodes, which proved that data augmentation and Fine-tune improve the accuracy and robustness. Rajadurai and Kang (2021) used Fine-tune on AlexNet and training using the crack dataset. The crack recognition accuracy can reach 99%. Therefore, transfer learning is used to improve the CNN accuracy.
Based on the above studies in the literature, this study proposes VGG16-U-Net to perform semantic segmentation on crack images to eliminate image noise. Five pre-trained CNNs, MobileNetV3-Large, VGG16, GoogLeNet, EfficientNetB0 and EfficientNetV2-S, are proposed for the crack identification and classification. To further improve the performance of the CNNs, this study proposes to use the fine-tuning to fix the weights during training. The Stochastic Gradient Descent with Warm Restarts (SGDR) is selected as the optimizer to avoid the local optimum issue.
In this study, we 1) study the crack identification ability of CNNs. Five different types of CNNs are trained using the building cracks dataset (including crack and non-crack images); 2) verify the cracks classification and generalization capabilities of the CNNs, the five CNNs are trained by the same building cracks dataset and the bridge cracks dataset (including transverse crack, vertical crack, oblique crack and irregular crack images); and 3) study the contribution of the transfer learning. This study focuses on the influence of the transfer learning method on MobilenetV3-large with the best performance in crack recognition. In addition, we investigate the contribution of semantic segmentation in the classification results by the five CNNs before and after using semantic segmentation and the results are compared and analyzed in detail.
2 Image preprocessing and evaluation indicator
2.1 Image semantic segmentation based on VGG16-U-net
In this study, the encoder of U-Net employs the first 15 layers of VGG16 and the dropout layer is added between the convolutional layers for preventing over-fitting. The image is up-sampled using the deconvolution layer (Dais, et al., 2021) and the deconvolution layer gradually restores the features to the original size of the image in the decoder. After the decoder, a 1 × 1 convolutional layer and a sigmoid activation function are connected to generate a prediction for each pixel in the image. Moreover, the encoder and decoder are connected by skipping and finally, the VGG16-U-Net is constructed as shown in Figure 1. The pre-trained VGG16-U-Net is adopted to separate the cracks from the complex picture background in this study. The comparison before and after semantic segmentation is shown in Figure 2, where the background pixels are represented in black and the cracks pixels are represented in white.
2.2 CNNs parameters setting and optimization
The original crack datasets and the ones after semantically segmented have been trained by VGG16, GoogLeNet, MobileNet V3-Large, EfficientNet B0 and EfficientNetV2-S, respectively. In addition, the fine-tuning is applied in the training process and employs the weights pre-trained on ImageNet.
To achieve a balance between model convergence and training time, each CNN for crack recognition has been trained with an epoch of 20, while each CNN for crack classification is trained with an epoch of 100. Masters and Luschi (2018) proved that for limited computing resources, the versatility and stability of the CNN with large batch size training are worse than that with small batch size.
Optimization of the deep learning networks updates the weight to minimize the loss with multiple local minima and the global optimal solution. In this study, SGDR (Loshchilov and Hutter, 2016) is utilized as the CNN optimizer to avoid local minima. During the training process, when the loss falls into the local minimum, SGDR would increase the learning rate to avoid falling into the local minimum and find a path to the global minimum. The formula is shown as Equation 1:
where
The improved cross-entropy loss function is treated as the loss function of the optimizer in this study. Compared with other loss functions, the cross-entropy loss function (Jamin and Humeau-Heurtier, 2019) is more robust under noisy data and converges to a better local minimum under less noisy data (Sga et al., 2021). To improve the computational efficiency, this study activates the output of the fully connected layer with the softmax function and mapped the output to the interval of (0,1). Then, the mapped output is processed by the cross-entropy loss function. The softmax activation function (see Equation 2) and cross-entropy loss function (see Equation 3) applied in this study are:
where
where
2.3 Evaluation index
2.3.1 Evaluation index for cracks identification
Evaluation Index 1: Accuracy (Dais, et al., 2021), which represents the proportion of samples that predict correctly in the total samples and the formula, is defined as Equation 4:
where TP, FP, TN and FN are true positive, false positive, true negative and false negative, respectively. When the data set category is imbalanced, the Accuracy would incorrectly evaluate the classification performance of the CNN. When there are most mainstream categories in a testing set, the entire testing set is classified as the mainstream category. The Accuracy shows that the CNN performance is good, but the weak categories are misidentified as the mainstream categories. So the Accuracy cannot fully evaluate the performance of the CNN. F1-score addresses the above problems, which is an evaluation parameter that comprehensively considers precision and recall. The formulas of precision and recall are as follows in Equations 5, 6:
Evaluation Index 2: The F1-score takes into account the accuracy and recall of the CNN, which is mainly used to measure the accuracy of the crack recognition. The formula is defined as Equation 7:
In this study, the accuracy and F1-score are used as the indexes to evaluate the performance of the CNNs in crack recognition.
2.3.2 Evaluation index for cracks classifications
The score of each category is calculated and then the average value of all the F1-score is calculated to get Macro-F1. To study the cracks classifications, Macro-F1 (see Equation 8) is used as the evaluation index:
where
3 Crack identification and classification results
This section summarizes the results of VGG16, GoogLeNet, MobilenetV3-large, EfficientNetB0 and EfficientNetv2-S verified by the three datasets. The network performance, especially the robustness and the generalization ability, was compared with accuracy, F1-score, training time, size.
3.1 Dataset construction
This study constructed three datasets with different task objectives based on two public crack datasets to systematically evaluate the performance of mainstream CNNs in crack identification and classification tasks. The images used in this research are sourced from the following two public datasets:
1. The building crack dataset (Lei et al., 2016; Ç.F. and Arzu Gönenç, 2018): This dataset comprises 20,000 crack images and 20,000 non-crack images, each with a resolution of 227 × 227 pixels.
2. The bridge crack dataset (Liangfu et al., 2019): This dataset contains 2,000 crack images with a resolution of 1024 × 1024 pixels.
In this study, crack classification rules were established based on common crack morphologies, as illustrated in Figure 3. Furthermore, notable differences in image characteristics exist between the two datasets (see Figure 4). The building crack dataset features lower-resolution images with uneven illumination and substantial stains, which can interfere with crack identification. Meanwhile, the bridge crack dataset, while offering higher resolution and clearer crack morphology, presents a greater diversity of irregular crack patterns and subtle stains, thereby increasing the difficulty of classification.
Figure 3. Crack categories. (a) Transverse creak. (b) Vertical crack. (c) Oblique crack. (d) Irregular crack.
In accordance with the different research objectives, this study systematically annotated and partitioned the two original crack datasets to construct three datasets, each serving a distinct purpose.
3.1.1 Dataset 1: from the building crack dataset
This dataset comprises a total of 40,000 images, with a balanced distribution of 20,000 crack and 20,000 non-crack samples. It primarily serves to evaluate the performance of the mainstream CNNs in this paper for the binary classification task of crack identification.
3.1.2 Dataset 2: from the building crack dataset
This dataset was created by manually selecting 1,703 crack images covering four crack morphologies: transverse, vertical, oblique and irregular. The dataset was expanded to 6,108 images through data augmentation techniques including flipping, mirroring, cropping, and rotation. It contains four crack morphology categories with the following distribution: 1506 transverse crack images, 1027 vertical crack images, 1327 oblique crack images and 2248 irregular crack images. This dataset is designated for evaluating the performance of mainstream CNNs in the four-class crack classification task.
3.1.3 Dataset 3: from the bridge crack dataset
To evaluate the models generalizability, this dataset was constructed by manually annotating 1,997 high-resolution images from the bridge crack dataset. Following the same augmentation strategy as Dataset 2, the dataset was expanded to 6,532 images. It also contains four crack morphology categories with the following distribution: 1437 transverse crack images, 1430 vertical crack images, 1374 oblique crack images and 2291 irregular crack images. Compared with Dataset 2, Dataset 3 features higher image resolution and clearer crack morphology, while presenting greater diversity in irregular crack patterns. These characteristics make it particularly suitable for comprehensively evaluating the classification performance and generalization capability of CNNs across different scenarios and imaging conditions.
To objectively evaluate the performance of different models, 4,000 images from Dataset 1 and 1,000 images each from Datasets 2 and 3 were allocated as test sets, which were excluded from the training process. The remaining images in each dataset were then split, with 80% used for training and 20% for validation.
All tests in this study were conducted on a laptop equipped with an Intel Xeon W-2245 CPU, running the Ubuntu 18.04 operating system, and utilizing the PyTorch 1.7.1 deep learning framework.
3.2 Analysis of cracks recognition results in dataset 1
To verify the performance of the five CNNs on crack recognition, Dataset 1 was trained with an epoch of 20. Finally, the performance was verified with the testing set. To validate the effect of semantic segmentation preprocessing, each CNN was trained with semantic segmentation preprocessing and without image preprocessing. The MobileNetV3-Large was trained with and without transfer learning in Dataset 1 to investigate the impact of the transfer learning. The performance is assessed by accuracy, F1-score, training time, etc., respectively and they are summarized in Table 1.
The test results of the five CNNs on crack recognition were analyzed, which are discussed as follows:
1. Accuracy and F1-score: Five CNNs have superior performance in crack recognition. The accuracy and F1-score are almost above 99% except for the case of VGG16 as it does not have semantic segmentation. The accuracy rate is 98.78% and the F1-score is 98.79%.
2. The influence of transfer learning: Transfer learning significantly improves the performance of CNNs. Without transfer learning, the CNNs need to spend more time learning to extract features. The crack recognition results from MobilenetV3-Large with and without transfer learning are compared. Since MobileNetV3-Large has the fastest training speed, the contrast effect is the most obvious. It should be noted that both cases lack semantic segmentation. The results show that the training time of the MobilenetV3-Large without pre-training by transfer learning is 311s, the size is 32.3 MB and the accuracy is 99.67%. For the case of the pre-training by transfer learning, the training efficiency is doubled as it only needs 138s training time, the size is only 16.2 MB and the accuracy is only 99.78%. The results show that transfer learning can effectively improve the performance of CNNs.
3. The influence of semantic segmentation: Semantic segmentation can accelerate the training speed and recognition speed for the five CNNs with training and recognition speed accelerated averagely by 17.16% and 17.14%, respectively. The efficiency of the training performance for GoogLeNet is improved significantly by semantic segmentation with an improvement rate of 33.33% and the training time is reduced from 432s to 324s. MobilenetV3-Large is improved and has the highest recognition efficiency of 56.2% from 50 images per second up to 114.3 images per second by semantic segmentation. However, the semantic segmentation used to preprocess the datasets only has little influence on the accuracy of the recognition results, which means the five CNNs are robust to deal with the cracks recognition. The accuracy and F1-score of the first three CNNs are improved by preprocessing the images by semantic segmentation, while the accuracy and F1-score of MobilenetV3-Large and GoogLeNet decreased by 0.2%, 0.23% and 0.1%, 0.15%, respectively. The reason is that semantic segmentation cannot accurately segment non-crack images. Although theoretically, if there are no cracks existed in the images, it would be processed as a pure black image. But in fact, part of the images was segmented incorrectly due to the presence of light shadows, water stains and oil stains, etc., which were recognized as non-background. The non-crack images were conducted before and after the semantic segmentation and the images can be found in Figure 5.
4. Comprehensive performance: MobileNetV3-Large with transfer learning and semantic segmentation has the best comprehensive performance. It has the shortest training time of 114s, an accuracy is 99.58%, F1-score is 99.60% and model size is only 16.2 MB EfficientNetV2-S with semantic segmentation has the highest accuracy of 99.83% and the highest F1-score of 99.80%. MobileNetV3-Large with semantic segmentation has the fastest training speed of 114s and the fastest recognition speed of 114.3 images per second. In terms of size, EfficientNetB0 is only 15.5 MB.
3.3 Analysis of crack classification results in dataset 2
As discussed in the previous section that transfer learning is beneficial to improve the performance of CNNs, thus transfer learning was used in the five CNNs. To further analyze the performance, the five CNNs were trained with an epoch of 100 to classify four types of cracks in Dataset 2. To verify the influence of semantic segmentation, each CNN was trained with semantic segmentation preprocessing and without image preprocessing. Accuracy, Macro-F1, training time, Recognition speed, Lr and Batch size were calculated and summarized in Table 2.
The test results of the five CNNs for classifications in Dataset 2 were analyzed and the analysis results are as follows:
1. Comparison of the performance of the CNNs in cracks recognition and classification: Compared with Dataset 1, the accuracy of the five CNNs on Dataset 2 dropped by an average of 13.38% and the Macro-F1 dropped by an average of 13.472%. The reason is that in crack classification, the CNNs were interfered with by the background noise and various morphology of the cracks would affect the accuracy of the CNNs in multi-classification.
2. Accuracy and Macro-F1: Among the five CNNs, GoogLeNet can extract crack features and classify the crack under the influence of background noise regardless of whether the images are preprocessed by semantic segmentation or not. Without the image preprocessing, the accuracy is 93.30% and Macro-F1 is 93.28%. While preprocessing the images by semantic segmentation, the accuracy is 94.60% and Macro-F1 is 94.31%. The reason is that two additional auxiliary classifiers were added in the middle layer of GoogLeNet and it also has a strong recognition ability. The auxiliary classifiers can extract the middle layer features of the training process and use the features for updating training weight. However, the training time is long and the classification speed is slow with only 9.1 images/s, which is difficult to use in practical applications.
3. The influence of semantic segmentation: Dataset 2 was classified into four-category cracks by the five CNNs with semantic segmentation, which can effectively improve the accuracy, Macro-F1 (as shown in Figure 6), training speed and classification speed, but it does not increase the size. After semantic segmentation preprocessing for Dataset 2, the classification of each CNN increases averagely by 4.72% in accuracy and 4.34% in Macro-F1. MobileNetV3-Large has the most increase with accuracy being increased by 8.20% and Macro-F1 by 8.26%, respectively. The training speed of each CNN increased the average by 6.76%, whereas EfficientNetB0 has the highest improvement of 9.89%. Simultaneously, the classification speed of each CNN is improved with an average increase of 35.90%, whereas MobileNetV3-Large increases most with 58.67%. It shows that semantic segmentation improves the performance of the CNNs in the cracks classification effectively.
4. Comprehensive performance: MobileNetV3-Large with semantic segmentation has the best comprehensive performance with accuracy of 94.70%, F1-score of 94.71%, the shortest training time is 49s and the size is 16.2 MB. Thus, MobileNetV3-Large can achieve a good balance between accuracy and size with the value of the practical application.
To discuss MobileNetV3-Large with the best comprehensive performance in Dataset 2 further, a confusion matrix is adopted to visualize the crack four-category results by MobileNetV3-Large in Dataset 2 and is shown in Figure 7. In the confusion matrix, each column represents the actual label and each row represents the actual categories for transverse crack, vertical crack, oblique crack and irregular crack in turn. So, 212 in the first row and the first column represents that the samples number of the transverse cracks is correctly predicted as transverse cracks are 212; 28 in the first row and the third column indicates that the samples number of the oblique crack is incorrectly predicted as transverse cracks are 28. To further discuss the comprehensive performance of MobileNetV3-Large on Dataset 2, a confusion matrix is employed to visualize the four-category crack classification results, as presented in Figure 7.
Figure 7. Confusion matrix by MobileNetV3-Large in Dataset 2: (A) The results without semantic segmentation, (B) the results with semantic segmentation. 1,2, 3,4 represents transverse crack, vertical crack, oblique crack and irregular crack.
In addition, compared with the four other CNNs, MobileNetV3-Large introduces a lightweight Squeeze and Excitation (SE) structure (Jie, et al., 2017) as shown in Figure 8. SE structure module compresses the feature map of the convolution along the spatial dimension to obtain a 1D vector that matches the channels of the image (the RGB image has three channels) and weights the vector to the previous feature channel-by-channel to realize the recalibration of the original feature. On the other hand, MobileNetV3-Large applies the h-swish activation function, which maps non-linearly the input of the neuron to the output, reduces accuracy loss and improves efficiency by approximately 15% (Howard, et al., 2019). The h-swish formula is as follows in Equation 9:
where x is the input of the convolutional layer using the activation function; the function of ReLU6 (x+3) is to ensure the output is not less than 0. If x+3 > 6, then ReLU6 (x+3) = x+3; if x+3 ≤ 6, then ReLU6 (x+3) = 0. x+3 ≤ 6, then ReLU6 (x+3) = 0.
5. Preferred CNN: The newest EfficientnetV2-S (2021) adopted in this paper has a slow training speed and accuracy and Macro-F1 acquired do not reach the expected value. It indicates that the emerging CNNs cannot be selected blindly in different application scenarios and images without basis. In other words, the performance of the CNNs should be evaluated from multiple perspectives, such as the robustness and generalization ability, according to actual applications by the indexes, e.g., accuracy, Macro-F1, training time and size, etc., for choosing the appropriate one, which is also the original intention and goal of this study.
3.4 Analysis of crack classification results in dataset 3
To further verify the generalization capabilities of the five CNNs, each CNN was trained with an epoch of 100 in Dataset 3. Each CNN was trained with semantic segmentation preprocessing and without any image preprocessing. accuracy, F1-score, training time and size were calculated and summarized in Table 3. The test results of the five CNNs for classifications in Dataset three were analyzed as follows:
1. Accuracy and Macro-F1: Without semantic segmentation, GoogleNet has the highest accuracy of 95.70% and Macro-F1 of 95.70%, while VGG16 has the lowest accuracy of 86.20% and Macro-F1 of 86.61%. If the image preprocessing by semantic segmentation, MobileNetV3-Large has the highest accuracy of 95.70% and Macro-F1 of 95.70% as shown in Figure 9.
2. The influence of semantic segmentation: For the bridge crack images in Dataset 3, it has higher resolution and semantic segmentation, which is useful to extract crack features by the CNNs (Figure 10). While the image preprocessing by semantic segmentation, the accuracy and Macro-F1 of each CNN are improved by an average increase of 5.12% and 4.85%, respectively, except for a slight decrease in using GoogLeNet.
Figure 10. Comparison of Macro-F1 corresponding to the cracks classification in five CNNs: (A) The results without semantic segmentation; (B) the results with semantic segmentation.
Without semantic segmentation preprocessing, the F1-score of irregular cracks is significantly lower than that of the other three types of cracks. The reason is that irregular cracks belong to the samples that are difficult to be classified. The CNNs can extract the features of simple cracks (such as transverse cracks, oblique cracks, etc.) and classify them effectively. But CNNs are affected by the background noise of the irregular crack images and the extracted crack features are enough. While preprocessing by semantic segmentation, the F1-score of the irregular cracks increases significantly as shown in Figure 10B. Therefore, the background noise can be removed effectively by using semantic segmentation, which is beneficial for extracting the image features. The results show that MobileNetV3-Large has the highest F1-score of irregular cracks at 93.4%.
3. Comprehensive performance: MobileNetV3-Large with semantic segmentation gained the best comprehensive performance while considering the problems such as size, the feasibility of transplantation to mobile devices and the shorter training time, etc. It should be noted that the results by MobileNetV3-Large have the shortest training time of 44s, accuracy of 95.70%, Macro-F1 of 95.69%, the fastest classified speed of 47.6 images per second with a size of 16.2 MB. Similarly, MobileNetV3-Large also performed excellently in Datasets 1 and 2 and are robust, which indicates MobileNetV3-Large has practical application value.
To further discuss MobileNetV3-Large with the best comprehensive performance, the results of the four categories MobileNetV3-Large for Dataset 3 are shown in Figure 11 through the confusion matrix. The results show that the number of images predicted correctly increased. For example, the number of transverse cracks predicted correctly without semantic segmentation is 218, while the number of correct predictions is increased to 238 with semantic segmentation. Similarly, the number of vertical cracks increases from 228 to 245, oblique cracks increase from 205 to 242 and irregular cracks increase from 217 to 232. In addition, Figure 12 shows the loss convergence curve of MobileNetV3-Large. The MobileNetV3-Large has better convergence with semantic segmentation and its final convergence loss is lower than that without semantic segmentation.
4. Robustness and generalization ability of the CNNs: Comparing the performance of the five CNNs in Dataset two and Dataset 3, each CNN has better performance in Dataset two and Dataset 3 with semantic segmentation. This illustrates that semantic segmentation with VGG16-U-Net proposed to preprocess the Datasets is an effective way to enhance the robustness of the five CNNs. On the other hand, considering the generalization ability of the five CNNs facing problems in crack recognition cracks identification and transfer learning play an important role.
Figure 11. Confusion matrix by MobileNetV3-Large in Dataset 3: (A) The results without semantic segmentation, (B) the results with semantic segmentation. 1,2, 3,4 represents transverse crack, vertical crack, oblique crack and irregular crack.
The efficiency of recognition and classification by the five CNNs in this study is summarized in Dataset 3 than in Dataset 2. Because Dataset 3 has a higher resolution of 1024 × 1024 while Dataset 2 has a lower resolution of 227 × 227. Dataset 2 was inputted into the CNNs as a 227 × 227 × 3 matrix with RGB channel, while Dataset three was treated as a 1024 × 1024 × 3 matrix input, which contains more information and is conducive to the CNNs to extract more image features. Thus, the higher resolution camera is recommended to obtain a high-quality dataset under good lighting conditions when using deep learning for crack detection.
5. Over-fitting phenomenon: Over-fitting (Babyak, 2004) phenomenon exists in each CNN, but fortunately, semantic segmentation can help reduce over-fitting. The over-fitting phenomenon refers to a CNN with a large gap in the performance between the training and testing process (Babyak, 2004). The comparison of accuracy results for Dataset 3 in the training set and testing set is shown in Figure 13. The accuracy of VGG16 and EfficientNetB0 in the training set is 98.80% and 97.50% without semantic segmentation, respectively, while the accuracy of the two CNNs in the testing set is only 86.20% and 88.80%, revealing that the over-fitting phenomenon is serious. When preprocessing the images with semantic segmentation, the accuracy of each CNN in the training set and the testing set differs slightly, where MobileNetV3-Large has the best performance with the smallest gap of 0.9% between the testing set and the training set. The over-fitting problem is improved by semantic segmentation.
6. Insufficient ability of the CNNs to extract edge crack features: It is difficult to extract the crack feature when the crack is at the edge or corner of the image. The images with incorrect predictions are visualized by Tensorboard as shown in Figure 14. In Figure 14, there are single morphological cracks in the main body range of the four images and there are other extended cracks at the edge corner of the images as irregular cracks. For instance, in Figure 14d, while an oblique crack is the primary feature, a secondary vertical crack extending along the right edge of the image introduces complexity. As a result of this interference, the CNN achieved a correct identification of the “oblique crack” class, albeit with a relatively low prediction probability of 62.3%.
Figure 14. Prediction of original images with semantic segmentation (Pred is predicated label and Prob is the Probability of prediction): (a) The predicted probability of the transverse crack is 96.20%; (b) The predicted probability of the transverse crack is 50.16%; (c) The predicted probability of the vertical crack is 96.20%; (d) The predicted probability of the oblique crack is 62.30%.
4 Conclusion
This paper has adopted the VGG16-U-Net to perform automatic image preprocessing with semantic segmentation and compared the performance with VGG16, GoogLeNet, MobileNetV3-Large, EfficientNetB0 and EfficientNetV2-S in the crack recognition and classification. The fine-tuning technology of transfer learning used in the training process can reduce the training costs and improve the CNNs performance. The conclusions are as follows:
1. Performance comparison in crack recognition and classification: In crack recognition, the CNNs used in this paper recognized the cracks with accuracy of more than 99% (except for VGG16). Instead, CNN’s have limitations on the dataset for four crack classifications. This shows that the CNNs cannot maintain high accuracy and short training time without semantic segmentation.
2. The role of semantic segmentation: Semantic segmentation can significantly improve the performance of the CNNs and accelerate the training speed, whereas MobileNetV3-Large with semantic segmentation has the best performance of 95.70% accuracy and 95.69% Macro-F1 in Dataset 3. Even in Dataset 2, which has the lower resolution images, MobileNetV3-Large has still achieved 94.70% accuracy and 94.71% Macro-F1, which has proved that MobileNetV3-Large combined with semantic segmentation by VGG16-U-Net has the best robustness. The newest EfficientNetV2-S proposed in 2021 and EfficientNet proposed in 2019 have been studied in this paper and it has been found that the two CNNs could not meet the expected performance. It has indicated that the emerging CNNs should be selected according to the need of an actual application. It is suggested that the use of the latest deep learning technology from computer science for civil engineering issues should be carefully considered and investigated.
3. Using the high-resolution image dataset: The results in this study have shown that high-resolution data is useful for CNNs to extract features. Therefore, when deep learning is applied to crack inspection in actual applications, a high-resolution camera should be used under good lighting conditions.
4. Alleviating over-fitting by semantic segmentation: Semantic segmentation can reduce the over-fitting phenomenon in the training process of CNNs. MobileNetV3-Large with semantic segmentation by VGG16-U-Net has the best performance in the fitting of only a 0.9% accuracy gap between the training set and the testing set.
5. Irregular cracks are hard samples to be classified: In this paper, it has been found that irregular cracks are difficult to classify and irregular cracks are often accompanied by greater image noise. Meanwhile, it is difficult to extract the features of cracks at the edges and corners of images. Therefore, in future studies, we are interested in carrying out more studies on how to solve the difficult classification samples such as irregular cracks and how to extract edge cracks.
The findings in this study can provide references for surface crack identification and classification using CNNs in other fields. To further improve the classification and recognition effect of the CNNs, how to improve the retention and effective extraction of the images’ edge features is needed to be considered.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: https://data.mendeley.com/datasets/5y9wdsg2zt/1.
Author contributions
LC: Formal Analysis, Data curation, Writing – review and editing, Methodology, Validation, Writing – original draft, Investigation, Resources, Funding acquisition, Conceptualization, Software. HY: Conceptualization, Validation, Software, Writing – original draft, Methodology, Visualization. KG: Conceptualization, Data curation, Investigation, Resources, Writing – review and editing. ZH: Validation, Writing – review and editing, Formal Analysis. JZ: Project administration, Writing – review and editing, Supervision, Formal Analysis. C-TN: Investigation, Writing – review and editing, Validation. JF: Supervision, Writing – review and editing, Project administration, Funding acquisition.
Funding
The authors declare that financial support was received for the research and/or publication of this article. The research was sponsored by the Key Program of the National Natural Science Foundation of China (Grant No. 52538010, Fu), the National Natural Science Foundation of China (Grant No. 52108276, Chen), Basic and Applied Basic Research Foundation of Guangdong Province (Grant No. 2020A1515110870, Chen), Science and Technology Projects in Guangzhou (Grant No. 202102010447, Chen) and Tertiary Education Scientific research project of Guangzhou Municipal Education Bureau (Grant No. 2024312401; Chen).
Conflict of interest
Author KG was employed by Guangzhou Testing Center of Construction Quality and Safety Co., Ltd.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The authors declare that no Generative AI was used in the creation of this manuscript.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Aldao, E., Fernández-Pardo, L., González-deSantos, L. M., and González-Jorge, H. (2023). Comparison of deep learning and analytic image processing methods for autonomous inspection of railway bolts and clips. Constr. Build. Mater. 384, 131472. doi:10.1016/j.conbuildmat.2023.131472
Alipour, M., Harris, D. K., and Miller, G. (2019). Robust pixel-level crack detection using deep fully convolutional neural networks. J. Comput. Civ. Eng. 33 (6), 04019040. doi:10.1061/(asce)cp.1943-5487.0000854
Babyak, M. A. (2004). What you see May not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom. Med. 66 (3), 411–421. doi:10.1097/00006842-200405000-00021
Cheng, X., Ji, X., Henry, R. S., and Xu, M. (2019). Coupled axial tension-flexure behavior of slender reinforced concrete walls. Eng. Struct. 188, 261–276. doi:10.1016/j.engstruct.2019.03.026
Ç. F., Ö., and Arzu Gönenç, S. (2018). “Performance comparison of pretrained convolutional neural networks on crack detection in buildings,” in Proc., 35th international symposium on automation and robotics in construction (ISARC 2018) (IAARC Publications), 693700.
Dais, D., Bal, İ. E., Smyrou, E., and Sarhosis, V. (2021). Automatic crack classification and segmentation on masonry surfaces using convolutional neural networks and transfer learning. Autom. Constr. 125, 103606. doi:10.1016/j.autcon.2021.103606
Dung, C. V., Sekiya, H., Hirano, S., Okatani, T., and Miki, C. (2019). A vision-based method for crack detection in gusset plate welded joints of steel bridges using deep convolutional neural networks. Autom. Constr. 102, 217–229. doi:10.1016/j.autcon.2019.02.013
Fan, X., Cao, P., Shi, P., Chen, X., Zhou, X., and Gong, Q. (2022). An underwater dam crack image segmentation method based on multi-level adversarial transfer learning. Neurocomputing 505, 19–29. doi:10.1016/j.neucom.2022.07.036
Garbowski, T., and Gajewski, T. (2017). Semi-automatic inspection tool of pavement condition from three-dimensional profile scans. Procedia Eng. 172, 310–318. doi:10.1016/j.proeng.2017.02.004
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., et al. (2019). “Searching for MobileNetV3,” in Proc., proceedings of the IEEE/CVF international conference on computer vision, 1314–1324.
Jamin, A., and Humeau-Heurtier, A. (2019). (multiscale) cross-entropy methods: a review. Entropy 22 (1), 45. doi:10.3390/e22010045
Jenkins, M., Carr, T. A., Iglesias, M. I., Buggy, T., and Morison, G. (2018). “A deep convolutional neural network for semantic pixel-wise segmentation of road and pavement surface cracks,” in 2018 26th European signal processing conference (EUSIPCO) (Proc).
Jacob, K., Mark David, J., Peter, B., Mike, M., and Gordon, M. (2019). “A convolutional neural network for pavement surface crack segmentation using residual connections and attention gating,” in 2019 IEEE international conference on image processing (ICIP) (Proc).
Jie, H., Li, S., Samuel, A., Gang, S., and Enhua, W. (2017). “Squeeze-and-Excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition (Proc), 1314–1324.
Lei, Z., Fan, Y., Yimin Daniel, Z., and Ying Julie, Z. (2016). “Road crack detection using deep convolutional neural network,” in IEEE international conference on image processing (Proc).
Liangfu, L., Weifei, M., and Cheng, L. (2019). Research on bridge crack detection algorithm based on deep learning. J. Autom. 45 (9), 1727–1742.
Li, B., Wang, K. C., Zhang, A., Yang, E., and Wang, G. (2020). Automatic classification of pavement crack using deep convolutional neural network. Int. J. Pavement Eng. 21 (4), 457–463. doi:10.1080/10298436.2018.1485917
Liu, F. Y., Wang, L. B., and Liu, J. (2022). Deep learning and infrared thermography for asphalt pavement crack severity classification. Autom. Constr. 140, 104383. doi:10.1016/j.autcon.2022.104383
Long, J., Shelhamer, E., and Darrell, T. (2015). “Fully convolutional networks for semantic segmentation,” in Proc., 2015 IEEE conference on computer vision and pattern recognition (CVPR).
Loshchilov, I., and Hutter, F. (2016). “SGDR: stochastic gradient descent with warm restarts,” in Proc., ICLR 2017 (5th International Conference on Learning Representations).
Ma, J. (2014). A Summary of academic research on Chinese Bridge Engineering·2014. J. Chin. Highw. 27 (5), 1–96.
Masters, D., and Luschi, C. (2018). Revisiting small batch training for deep neural networks. arXiv Preprint arXiv:1804.07612, 07612. doi:10.48550/arXiv.1804.07612
Mingxing, T., and Quoc, V. L. (2019). “EfficientNet: rethinking model scaling for convolutional neural networks,” in 36th international conference on machine learning (ICML). Editor C. Long Beach (PMLR), 6105–6114.
Mohsen, A., Eslamlou, A. D., and Pekcan, G. (2020). Data-Driven structural health monitoring and damage detection through deep learning: State-of-the-Art review. Sen 20 (10), 2778. doi:10.3390/s20102778
Nhat-Duc, H., Nguyen, Q.-L., and Tran, V.-D. (2018). Automatic recognition of asphalt pavement cracks using metaheuristic optimized edge detection algorithms and convolution neural network. Autom. Constr. 94, 203–213. doi:10.1016/j.autcon.2018.07.008
Otieno, M. B., Alexander, M. G., and Beushausen, H. D. (2010). Corrosion in cracked and uncracked concrete-influence of crack width, concrete quality and crack reopening. Mag. Concr. Res. 62 (6), 393–404. doi:10.1680/macr.2010.62.6.393
Phares, B. M., Washer, G. A., Rolander, D., Graybeal, B. A., and Moore, M. (2004). Routine highway Bridge inspection condition documentation accuracy and reliability. J. Bridge Eng. 9 (4), 403–413. doi:10.1061/(asce)1084-0702(2004)9:4(403)
Piao, W. (2019). Research on pavement crack segmentation algorithm in complex environment. Zhengzhou University.
Rajadurai, R. S., and Kang, S. T. J. A. S. (2021). Automated vision-based crack detection on concrete surfaces using deep learning. Appl. Sci. 11 (11), 5229. doi:10.3390/app11115229
Ronneberger, O., Fischer, P., and Brox, T. J. S. (2015). “U-Net: Convolutional networks for biomedical image segmentation,” in Proc., International Conference on Medical image computing and computer-assisted intervention (Springer), 234–241.
Sga, B., Mgta, B., and Hsyb, C. (2021). Robustness of convolutional neural network models in hyperspectral noisy datasets with loss functions. Comput. Electr. Eng. 90, 107009. doi:10.1016/j.compeleceng.2021.107009
Sharma, M., Anotaipaiboon, W., and Chaiyasarn, K. (2018). Concrete crack detection using the integration of convolutional neural network and support vector machine. Sci. Technol. Asia. 30, 19–28. Available online at: https://tci-thaijo.org/index.php/SciTechAsia.
Shengyuan, L., and Xuefeng, Z. (2019). Image-Based concrete crack detection using convolutional neural network and exhaustive search technique. Adv. Civ. Eng. 2019, 1–12. doi:10.1155/2019/6520620
Silva, W. R. L. d., and Lucena, D. S. d. (2018). Concrete cracks detection based on deep learning image classification. Proceedings 2 (8), 489. doi:10.3390/icem18-05387
Simonyan, K., and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. Comput. Sci. doi:10.48550/arXiv.1409.1556
Song, W., Jia, G., Jia, D., and Zhu, H. (2019). Automatic pavement crack detection and classification using multiscale feature attention network. IEEE Access 7, 171001–171012. doi:10.1109/access.2019.2956191
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., and Rabinovich, A. (2014). “Going deeper with convolutions,” in Proc., Proceedings of the IEEE conference on computer vision and pattern recognition, 1–9.
Tan, M., and Le, Q. (2021). “EfficientNetV2: smaller models and faster training,” in Proceedings of the 38th international conference on machine learning, PMLR, proceedings of machine learning research, 10096–10106.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., and Liu, C. (2018). “A Survey on deep transfer learning,” in Proc., International conference on artificial neural networks, 270–279.
Yin, X., Chen, Y., Bouferguene, A., Zaman, H., Al-Hussein, M., and Kurach, L. (2020). A deep learning-based framework for an automated defect detection system for sewer pipes. Autom. Constr. 109, 102967. doi:10.1016/j.autcon.2019.102967
Yu, Z., Shen, Y., and Shen, C. (2021). A real-time detection approach for bridge cracks based on YOLOv4-FPM. Automation Constr. 122, 103514. doi:10.1016/j.autcon.2020.103514
Yusof, N., Osman, M. K., Noor, M., Ibrahim, A., and Yusof, N. M. (2018). “Crack detection and classification in asphalt pavement images using deep convolution neural network,” in Proc., 2018 8th IEEE international conference on control System, computing and engineering (ICCSCE), 227–232.
Zhang, J., and Stang, H. (1998). Applications of stress crack Width relationship in predicting the flexural behavior of fibre-reinforced concrete. Cem. Concr. Res. 28 (3), 439–452. doi:10.1016/s0008-8846(97)00275-5
Zhang, J., Lu, C., Wang, J., Wang, L., and Yue, X. G. (2019). Concrete cracks detection based on FCN with dilated convolution. Appl. Sci. 9 (13), 2686. doi:10.3390/app9132686
Zhao, Z. Q., Zheng, P., Xu, S. T., and Wu, X. (2019). Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn Syst. 30, 3212–3232. doi:10.1109/tnnls.2018.2876865
Keywords: convolutional neural networks (CNNs), crack recognition and classification, semantic segmentation, transfer learning, robustness, generalization ability
Citation: Chen L, Yao H, Gan K, Huang Z, Zhang J, Ng C-T and Fu J (2025) Evaluation of MobileNetV3-Large for crack classification across low- and high-resolution images. Front. Built Environ. 11:1724879. doi: 10.3389/fbuil.2025.1724879
Received: 14 October 2025; Accepted: 18 November 2025;
Published: 10 December 2025.
Edited by:
Tomasz Garbowski, Poznan University of Life Sciences, PolandReviewed by:
Anna Knitter-Piątkowska, Poznań University of Technology, PolandTomasz Gajewski, Poznań University of Technology, Poland
Copyright © 2025 Chen, Yao, Gan, Huang, Zhang, Ng and Fu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jing Zhang, emhhbmdqaW5nQGd6aHUuZWR1LmNu; Jiyang Fu, aml5YW5nZnVAZ3podS5lZHUuY24=
Haodong Yao1