Damage detection on steel-reinforced concrete produced by corrosion via YOLOv3: A detailed guide

Damage assessment applied to reinforced concrete elements is one of the main activities of infrastructure maintenance tasks. Among these elements, the problem of corrosion in reinforced concrete is particularly critical and requires careful consideration. Annually, governments invest a large amount of economic resources in this activity. However, most methodologies for damage assessment rely on visual inspection, which may be subjectively interpreted, producing inconsistent results and requiring a considerable amount of time and resources. This study evaluates the performance of real-time object detection using You Only Look Once, version 3, for detecting corrosion damage in concrete structures. The architecture of YOLOv3 is based on a complex, but efficient, convolutional neural network fed by a dataset proposed and labeled by the authors. Two training stages were established to improve the model precision, using transfer learning with medium- and high-resolution training images. The test results show satisfactory concrete-corrosion detection through validation photographs and videos demonstrating the capabilities of explainable artificial intelligence and its applications in civil engineering.


Introduction
Artificial intelligence (AI) applications have been growing in numerous fields of science and technology. Nowadays, several major approaches within the AI context, such as machine learning (ML) and deep learning (DL), are being developed and with a significantly large number of problems Naser (2023). Both methods are focused on unique kinds of analysis problems. Regression, classification, supervision, and recommendation are some of the most common issues. These techniques have been successfully implemented in different knowledge sectors and industries Guzmán- Torres et al. (2022). The latter tries to keep up with these technological implementations. As such a broad and essential sector for society, AI's applications in construction are innumerable. However, this situation is usually a disadvantage in the construction sector because many variables make it difficult to control the assessment process Torres et al. (2020).
Civil engineering focuses on designing, constructing, supervising, and maintaining urban infrastructure. Undoubtedly, most of the existing infrastructure is built with reinforced concrete, which implies using the two most used materials in construction worldwide, concrete and steel Martinez-Molina et al. (2021); Guzmán-Torres et al. (2021e). For this reason, it is crucial to carry out damage assessment in reinforced concrete effectively, efficiently, and objectively.
Corrosion is one of the main problems impacting reinforced concrete infrastructure, which directly affects the structural backbone exposed to the natural elements from spalling problems and impacts the security and durability of reinforced concrete structures Castañeda-Valdéz and Rodriguez-Rodriguez (2014). Usually, there are different approaches to monitor damage on concrete structures, most of which are supported by visual inspection techniques. In most cases, this methodology becomes subjective. Consequently, it might need to be more consistent, in addition to the fact that a considerable amount of time and economic resources are needed to manage the evaluation of the current state of concrete structures Guzmán- Torres et al. (2022). On the other hand, when the inspection methods are not visual, electrochemical and non-destructive tests (NDTs) such as ultrasonic pulse velocity, electrical resistivity, carbonation resistance, and resonance frequencies are usually used, having produced acceptable results Bungey and Grantham (2006); Breysse et al. (2021); Smirnova et al. (2020). However, the use of these techniques requires sophisticated equipment and skilled labor. Also, the NDTs should be complemented with other techniques to provide a more comprehensive assessment of the concrete conditions. Some of the limitations in the use of NDTs are as follows: • Interpretation of results. Usually, the NDT may require interpretation by trained professionals, and the operator's skills can affect the results. • Variability in concrete properties. The different conditions of the structures might affect the accuracy and reliability of NDT outcomes. • Cost. NDT methods can be expensive, and the cost is related to the size of the structure to be analyzed. • Surface conditions. For instance, the moisture grade can affect the results' accuracy.
These limitations highlight the necessity of implementing direct and feasible methodologies like ML techniques.
The exciting field of computer vision (CV) emerges from DL theory, which focuses on vision systems that execute a specific task on images and videos. These tasks detect, classify, and segment objects into images, photographs, and videos. The state-of-the-art shows success stories from adopting AI methods to monitor concrete materials and structures. The current research shows multiple DL-related works involving typical analysis problems. Some of them are connected with the property prediction of concrete and its behavior when subject to different kinds of loads (compressive strength, tensile strength, and flexural strength), an analysis of mixtures that include natural organic polymers, and concrete materials that involve other properties Guzmán-Torres et al. (2021a); Tahwia et al. (2021); Guzmán-Torres et al. (2021d); Bui et al. (2018); Guzmán-Torres et al. (2021c); Naderpour et al. (2018); Yaseen et al. (2018); Deng et al. (2018); Behnood et al. (2017); Belyakov et al. (2021);Yakovlev et al. (2021); Tayeh et al. (2022); Zeyad et al. (2022).
Studies such as concrete analysis under extreme load conditions, corrosion risk estimations, methods for predicting resistance in concrete elements under the action of fire, and estimation of durability on concrete specimens have been performed with computational approaches Naser (2021a); Guzmán- Torres et al. (2021b); Naser and Kodur (2022); Naser (2021b); Guzmán- Torres (2022).
It is important to note that several notable works in machine learning should be considered as the use of these techniques continues to increase for various applications. For example, Solhmirzaei et al. (2020) presented a data-driven ML framework, which uses multiple ML algorithms to predict the failure mode and shear capacity of ultra-high-performance concrete (UHPC) beams. In addition, the importance of explainability in ML models is highlighted in a study by Cakiroglu et al. (2022). They developed data-driven ML models using 719 experiments to predict the axial compression capacity of rectangular concrete-filled steel tubular columns Cakiroglu et al. (2022).
The implementation of ML algorithms has increased the effective evaluation of infrastructure. In a recent study, Zhen Sun et al. (2022) proposed a method combining an ML approach to evaluate the effectiveness of the tuned mass dampers. The authors used seven ML techniques to generate the predictive models using properties such as temperature and wind as input data. In addition to this approach, standard algorithms that use artificial neural networks (ANNs) remain prevalent nowadays, as we can observe in a study by Hemmatian et al. (2023), where the maximum fiber pull-out force and corresponding bond slip are predicted using ANNs.
Other works include a graphical user interface (GUI) to aid practicing engineers in developing new technological tools. It is possible to appreciate the study by Hemmatian et al. (2023), where a simple GUI was developed to accurately estimate the shear strength of fiber-reinforced polymer-reinforced concrete beams. It demonstrated a high level of accuracy and excellent performance.
In parallel, AI has been considered for more complex approaches, such as the analysis of images in civil engineering through convolutional neural networks using segmentation, classification, and detection of failures on concrete surfaces Ranjbar et al. (2021) Cha et al. (2017); this is one of the most elegant and impressive ways to represent pathologies, issues, and behavior related with infrastructure.
The study aims to demonstrate how some AI methods aid in spotting reinforced concrete damage, specifically corrosion damage. This is one of the most concerning problems in infrastructure Frontiers in Built Environment frontiersin.org nowadays. The approach used in this study, the followed steps, the advice of what hyperparameters must be changed in order to obtain a better performance, and the interpretability of the results produced by the model explain the ML model performance and make the implemented model more easily interpretable. ML is helpful in structural damage detection because it can help identify and analyze complex patterns that are difficult for humans to detect. This study highlights the necessity of CV applications because it emphasizes the early detection of damage and aids in corrosion prevention, leading to increased safety, reliability, and cost savings. This study intends to demonstrate how concrete visual inspection and damage detection can be deployed in real-time, in contrast to other ML models which perform object detection in static images.
The analysis of this problem and its development is of great interest to the construction and maintenance sector to reduce the requirements of economic and human resources.
2 The corrosion problem and a computer vision approach

Corrosion processes on concrete structures
Corrosion of embedded reinforced steel is one of the main problems in concrete structures. It has become a crucial topic that requires the full attention of maintenance engineers who work with concrete structures, such as those who maintain bridges and roads Kessler et al. (1997).
The phenomenon of reinforcing steel oxidation significantly affects the functional properties of reinforced concrete, such as its adhesion. Additionally, this phenomenon induces crack generation and spalling problems over concrete surfaces, which compromises the structural integrity del Valle Moreno et al. (2001).
Corrosion is an expression often used in metal degradation because of the electrochemical process. It causes considerable damage to buildings, bridges, ships, and cars Chang and Goldsby (2013). The corrosion process might start when aggressive agents such as chlorides, sulfates, and carbon dioxide penetrate the concrete matrix Borges et al. (1998). In civil infrastructure, the metals are often used as ceramic reinforcement, as can be noticed in the reinforced concrete, metallic structures, liquids, gas pipelines, and electric installations coated with insulating polymers. In reinforced concrete, ferrous alloys-basically metals-are the predominant ones.
The corrosion might cause critical damage to reinforced concrete as cracks and spalling. These damages result in low performance of concrete structures and cause a considerable reduction in the deformation capacity of the reinforcing steel. These issues put at risk the security of the building. Thus, the preventive actions, evaluation, detection, and control of the corrosion process are of paramount interest Herrera et al. (2022).
To mitigate the issues of corrosion in concrete structures, new concrete structures must be designed by a durability criterion, avoiding resistance-based methods such as Duff Abrams and the ACI (American Concrete Institute) tables, which are relied upon for the resistance criterion. Instead of resistance criteria, the idea is to design mixtures by durability. The ACI provides a design method considering this approach. In the construction processes, many factors are involved in the mixture performance, but mainly, the durability is directly related to the water-cement ratio. The water-cement ratio equal to or lower than 0.44 produces ceramic matrices with a lower percentage of interconnected pores, thus reducing the probability of presenting corrosion problems.
The cost of controlling corrosion on the infrastructure may be translated into insecurity for users, building demolitions, and the need for new civil infrastructure, all of which require considerable energy and resources, both human and economic. Therefore, the accurate and efficient detection of corrosion in concrete structures is of great importance in the civil engineering field, and CV can be a valuable tool in this regard.

Computer vision perspective
CV is a rapidly advancing field, made possible by recent refinements in AI and DL. As a society, we rely on technological tools to perform our daily activities efficiently, and CV applications have become universal in our everyday lives through the smart devices at our disposal. Facial recognition is an area in which CV projects have made significant progress. Smartphones, for instance, are increasingly better at recognizing faces to unlock themselves. CV is now a broad field that encompasses a wide range of techniques, including traditional CV, ML, and DL algorithms. Traditional CV algorithms rely on handcrafted features to extract and identify relevant information from images, whereas ML algorithms can be used to classify images or detect objects. One commonly used ML algorithm for these tasks is the support vector machine (SVM) technique. DL algorithms have demonstrated significant improvements in the precision of many CV applications, including object detection, image classification, and image segmentation.

Computer vision pipeline
Vision systems consist of two primary components: sensing devices, such as cameras, and interpreting devices, typically workstations or other computing devices. While the specific problems addressed by CV applications can vary, most vision systems use a sequence of distinct steps to process and analyze image data. These steps are commonly referred to as a CV pipeline, which involves acquiring input data, preprocessing the data, feature extraction, analysis, and recognition, and finally, the application of ML techniques to make predictions based on the information extracted from the image. Figure 1 illustrates the steps involved in a typical CV pipeline.

How computers see the images
Dealing with images or videos as input data is a typical routine in CV applications. For instance, if we refer to grayscale, it is possible to represent an image in matrix notation. CV approaches represent a grayscale photograph as a function of two variables, x and y, which define a two-dimensional area. A grid of pixels can represent a digital image. The pixel is the raw building block of an image. Each image consists of a set of pixels representing the light intensity in a given location in the photograph. When we look at an image or a photograph, we see objects, surfaces, colors, landscapes, and textures. However, that is not the case with computers. For computers, a grayscale image looks Frontiers in Built Environment frontiersin.org like a two-dimensional array of pixel values. Figure 2 represents an image with a size of 44 × 44. This size indicates the width and height of the image, where the figure has 44 horizontal and 44 vertical pixels. That means there is a total amount of 1,936 pixels, and each one in the array represents the brightness intensity in each pixel; 0 represents black, and 255 represents white. The previous condition applies to grayscale images, but color images are different. In color images, instead of representing the pixel value with one number, the value is represented by three numbers: the intensity of red (R), green (G), and blue (B), representing the intensity of each color in the pixel. Therefore, the system is described in an RGB scheme.

YOLOv3: A general overview
YOLOv3 is an advanced and sophisticated algorithm for detecting real-time objects based on a single stage. This single-stage algorithm is supported on a complex CNN, which is a significant improvement over previous versions, such as YOLO Redmon et al. (2016) and YOLOv2 (YOLO9000) Redmon and Farhadi (2017); Figure 3, taken directly from a study by Zhao et al. (2020); itshows the general architecture of the YOLOv3 algorithm implemented in this study. YOLOv3 architecture is primarily based on Darknet-53, which contains 23 residual units He et al. (2016). Each residual unit includes a 3 × 3 convolutional operation and a 1 × 1 operation, and at the end of each residual unit, a layer is added between the input and output vectors. These residual units are responsible for calculating the convolutional feature maps over each complete image used as an input parameter in the model. Each convolutional layer contains three sequential layers: a convolution layer O'Shea and Nash (2015), a batch normalization layer Ioffe and Szegedy (2015), and a leaky rectified linear unit (ReLU) layer Maas et al. (2013).
The YOLOv3 backbone is performed across five separate convolutional layers. Each convolutional layer works with a stride equal to two to reduce the feature map dimensionality and becomes more efficient in the operations performed during the training process. The ImageNet dataset Deng et al. (2009) is used to pre-   Redmon and Farhadi (1804). First, YOLOv3 operates by splitting the input image within a grid of cells, where each one is responsible for predicting a bounding box and whether the bounding box center falls within it. Within this process, each grid cell forecasts a bounding box involving parameters such as x and y coordinates, the width and height, and the confidence. Furthermore, a class prediction is related to each cell. Eventually, the bounding boxes and the class probabilities map are combined into a final set of class labels and bounding boxes.
YOLOv3 predicts three bounding boxes at each grid cell on three output feature maps. A general rule is that each predicted box has one confidence variable represented by t c , four class variables (t i , i = 1, 2, 3, 4), and four coordinate variables (t x , t y , t w , t h ). All the predicted variables are transformed into the object's confidence, the probability of each class, and the location to generate the predicted results Zhang et al. (2020). The object confidence C denotes the likelihood of a box containing an object, and this probability is computed using a sigmoid function, which is defined by the following equation.
( 1 ) Another meaningful task is the location prediction; YOLOv3 predicts the central coordinates of the bounding box, which is relative to the location of the grid cell in such a way that the center coordinates are between 0 and 1. Figure 4 shows that if the grid cell is offset from the upper-left corner of the image at (C x , C y ), then the predicted bounding box has coordinates (b x , b y ).
The last fully connected layer of the model uses a softmax classifier to detect the object of interest, which, in this case, is corrosion damage in concrete structures. However, since this problem only detects corrosion damage, the output class

FIGURE 4
Bounding box, anchor, and location to the prediction box process.

Frontiers in Built Environment
frontiersin.org corresponds to only one node. The softmax function transforms the output variables into a multi-class probability distribution, with each class corresponding to a different object category. Specifically, the model predicts the presence or absence of corrosion damage on reinforcing steel exposed to weathering by spalling damage or corrosion processes within the cementitious matrix.

Performance metrics
In this study, the performance of the YOLOv3 model was evaluated utilizing established object classification metrics, including precision, recall, and F-score in all the stages, training, testing, and validation.
Precision is the ratio between the valid number of instances and the total retrieved instances, and it is determined by the following expression.
where TP and FP represent the true positive and the false positive numbers, respectively. Recall is related to the false negative FN and explain how many true predictions were established as false predictions; also, it is known as sensitivity, and its formulation is denoted by Eq. 3.
F-score aids to generalize the performance of a model with one metric. This metric uses precision and recall in a simple mathematical formulation. Its function is written as follows: In addition to the aforementioned classification metrics, a detection indicator was applied to the model evaluation, Intersection over Union (IoU). The aim of the detection indicator is to evaluate how the YOLOv3 framework is performing the training and testing of corrosion images on concrete structures. The essence of the detection indicator can be summarized as follows; only the anchor with the highest value of the IoU with true prediction will be responsible for the object prediction. Mathematically, the IoU parameter is defined by

IoU
Area of overlap Area of Union .
Thus, assessing Eq. 5, it is possible to notice that IoU is simply a ratio; therefore, Eq. 5 can be easily represented, as shown in Figure 5.
Additionally, in the context of object detection metrics, we cannot avoid the implementation of important indicators such as average precision (AP), mean average precision (mAP), and the mean average precision 0.5 (mAP0.5). AP is a critical metric in object detection problems, and it is defined by the area under the precision-recall curve. The mAP is determined by averaging the AP overall classes or IoU thresholds. The parameter mAP0.5 is related to the mAP calculated for an IoU threshold of 50%, and the parameter mAP0.5:0.95 represents the average mAP calculated over IoU thresholds of 50%-95% in intervals of 5%.
Another aspect to be considered is that YOLOv3 uses the mean square error (MSE) loss function to train its neural network. It compares the predicted bounding box coordinates and class probabilities with the ground-truth annotations. The MSE loss function penalizes the network more severely for larger errors, which is desirable for object detection tasks where accurate bounding box predictions are crucial.
Using the MSE loss function in YOLOv3 also has the advantage of being computationally efficient, as it can be easily computed using vectorized operations. However, other object detection models may use different loss functions depending on the specific requirements of the task. The MSE is given by where N is the number of samples, y i is the true value for sample i, and theŷ i is the predicted value for sample i.

Dataset details
To address the insufficiency of an established image dataset about corrosion damage on concrete structures, this research provides a dataset of images built by the authors, CONCORNET2023. The image dataset contains 790 images, where each image denotes a particular concrete structure with some sign of corrosion damage.
The initial dataset was built using 159 images. However, the number of images was augmented introducing distortions in the initial dataset, achieving 790 images. It is possible to achieve CV tasks with a small dataset, but it can be more challenging compared to using a larger dataset. Training DL models for CV tasks requires a

Frontiers in Built Environment
frontiersin.org significant amount of labeled data to learn and generalize well. With a smaller dataset, the model may not be able to learn all the necessary features and patterns in the data, leading to overfitting or poor performance.
To overcome these challenges, there are several techniques that can be used to improve the performance of CV models with small datasets, such as data augmentation, transfer learning, and regularization. While these techniques can help improve the performance of models trained on small datasets, it is relevant to keep in mind that the model performance will ultimately be limited by the quality of the data available. Therefore, it is always recommended to collect as much high-quality data as possible to train robust and accurate CV models. For this study, data augmentation was implemented to address the small data issue.
All the images in CONCORNET2023 were captured using smartphones at different sites and places with random perspectives, angles, distances, and lighting conditions. These varieties or perturbations allow the generation of a robust model capable of detecting corrosion damage on concrete structures considering several circumstances. The image pixels range from 1,280 × 960 up to 12,400 × 12,400, i.e., the images contain different resolutions resulting in a challenging task for the model within the training stage at the moment of finding the minimal loss function value. Thus, the recognition of damaged elements in different conditions makes the model robust and flexible.
The input image size that YOLOv3 accepts by default is 416 × 416; however, it can also be trained and used with larger input sizes, such as 608 × 608 and 832 × 832. As described, the size of the images in CONCORNET2023 is bigger than the one established by YOLOv3. This can be addressed by setting the --img parameter during the training process. For the case of interest of this article, this parameter was set as-img 832. It is worth mentioning that the input image size can be adjusted during training and inference using data augmentation and resizing techniques. This can improve the robustness and precision of the model for different input image sizes and aspect ratios. Figure 6 shows some representative photographs of the image dataset built and trained by the authors.

Annotated dataset
The image collection is one of the most meaningful tasks in CV projects because the collected images will feed the model; therefore, they need to be correctly annotated or labeled. In this step, it is necessary to teach the model what is what (image labeling), and in consequence, the model learns to identify the object it has to detect. Undoubtedly, there are broadly open-access and non-open-access tools to perform the labeling image process. For this research, we used a traditional standard open-access method for labeling all the images, the LabelImg tool. This framework allows the user to visualize each image and manually generate a bounding box that delimits the object we want to identify in our detector. Figure 7 shows the LabelImg tool interface and the labeling process in some sample images.

Data augmentation
It is essential to generate reliable and robust models in the development processes of AI applications, and the availability of a great amount of data comes to aid in reaching this purpose. For this reason, it is necessary, in some cases, to consider increasing the image dataset, because the image conditions in the inference

FIGURE 6
Representative images contained in CONCORNET2023.

Frontiers in Built Environment
frontiersin.org processes might differ from the considered conditions within the model training process. Therefore, it is crucial to add noise to the images to distort in unique forms the actual photographs and generate modified versions of theirs Elgendy (2020). By introducing noise into the images, the model is trained and prepared to receive photographs with realistic and different conditions, which will feed the model to test its precision. These modifications include changes in light and brightness, object arrangement to be detected, different textures, offsets, angles, and rotations. Also, including distortion in the dataset helps increase the number of photographs.
The process of increasing the dataset is considered an efficient method of regularization and has become an indispensable step in the pipeline of CV problems to improve model performance. Figure 8 shows the effect of the introduction of noise in a sample image taken from the dataset built in this work. So, the parameters used to generate each modification in the data augmentation step are listed as follows:

FIGURE 7
Labeling process of some representative images.

Frontiers in Built Environment
frontiersin.org

Organizing directories
It is crucial to highlight that to run the YOLOv3 framework satisfactorily, it is necessary to have a correct directories' organization since the information that will be used (Images and labels) needs to be available according to the paths involved in the YOLOv3 framework.
YOLOv3 framework folder distribution includes a folder called data, which, in turn, contains different folders, of which, train, test, and val are of interest to us because these folders store the images that will be used as part of training, testing, and validation. Inside each folder previously mentioned, there are two folders, images and labels. The first one contains the photographs, whereas the folder named labels stores the corresponding text files of each one of the images stored in the images folder. Each text file must contain the coordinates of the location of the labeled object.
For the image distribution, the authors considered a standard split ratio of 80% for training, 10% for testing, and 10% for validation. Thus, 632 images were selected for training, and 79 images for testing and validation.
It is worth noting that all the images must be in a JPEG format. Otherwise, the model will generate an error in the compilation stage due to incompatibility with other formats. Within the root file of the YOLOv3 folder lie the scripts with which the model is executed. Without this correct organization, it will be impossible to run the model appropriately.

Training the model
By training the YOLOv3 model, it is necessary to have in mind some hyperparameters that need to be tuned. By default, YOLOv3 proposes specific values for the different hyperparameters specified in the model and also offers various configurations for these hyperparameters. This configuration depends on the performance level that the user wants to obtain and the problem that will be addressed. By regular compilations, the The training stage was carried out in two phases: the first one, following the default settings suggested by the framework, and the second one, adjusting the hyperparameters that were necessary for tuning. The first training stage was for observing how the model performed throughout the image dataset. However, the model performance through hundred iterations denoted the necessity of tuning hyperparameters such as the learning rate, the momentum, the patience, and the weight decay. All of them are hyperparameters that come in handy for overcoming overfitting problems.
Occasionally, using the default values of the YOLOv3 in the training stage works well. In most cases, this depends on the object intended to spot, remembering that hyperparameters proposed in the YOLOv3 framework were used for training the COCO dataset (a large-scale object detection, segmentation, and captioning dataset with several features) suggested by Microsoft Lin et al. (2014).
For the second stage of training, 30 hyperparameter combinations in the experiments were carried out to improve the model performance. The best proposal for the values of the hyperparameters that satisfactorily performed the task of detecting corrosion damage, at least for this work, is as follows: It is worth realizing that the previous configurations were implemented using the weights of the best epoch in the first stage of training, i.e., the use of partial transfer learning. The outcomes of the training stages will be discussed shortly in the next section. But for the moment, it is relevant to mention that the success of the training stage depends on many circumstances, such as the correct labeling process, the hyperparameter tuning during several experiments, and the robustness of the image dataset.
4 Results and discussion 4.1 Training outcomes As previously mentioned, the training process was executed twice. The results of the first stage are presented in the following part, beginning with the box loss function and its corresponding box validation loss function in Figure 9. The figure demonstrates how the learning process occurred over 400 epochs.
As depicted in Figure 9, the results from the initial training stage show a tangible overfitting problem during the first few epochs, with a noticeable divergence between the box loss function and the corresponding box validation loss function. It indicates that while the model is gaining knowledge during the training process, it is not reflecting this in the validation stage, thus leading to a performance gap between the two stages.
Similarly, Figure 10 also demonstrates an overfitting issue, with the training and validation graphs diverging as the training process proceeds. The primary goal of the training stage is to ensure a convergence between the training and validation performance or at Frontiers in Built Environment frontiersin.org least achieve equivalent performance, indicating that the model is capable of performing equally well in both training and validation instances.
For YOLOv3, it is necessary to compute both loss functions, box and object. It is because the model calculates the loss for both elements since the first task is to detect the object in the photographs, and the second activity is to delimit it with a bounding box. The loss of the latter is compared with the size of the bounding box proposed in the labeled images.
In the second training stage, the overfitting problem was addressed to obtain an adequate performance in the validation set. This was possible due to the correct manipulation of the hyperparameters-denoted in the previous section-in the YOLOv3 framework. Both performances, training and validation, in box loss are depicted in Figure 11.
After adjusting the hyperparameters, the overfitting problem was successfully addressed, as can be observed in Figure 11, where the discrepancy between the training and validation loss values decreased to a negligible value of 0.01871 over 31 epochs. In the second training instance, a hyperparameter value of 20 was assigned to the patience parameter, which is a technique to mitigate overfitting. Specifically, the patience parameter monitors the validation loss and concludes the training process when the model fails to improve for a certain number of epochs (the patience value). As a result, the second training stage consisted of only 31 epochs.
Similar to the box element, the losses for the object were also computed and can be observed in Figure 12 for both the training and validation stages.
As demonstrated in the box element, and in the object element, there is no evidence of overfitting problems. The absence of overfitting problems can also be observed in the object element, with a minimal difference value of 0.00423 between the training and Training and validation performances in the box loss function during 400 epochs.

FIGURE 10
Training and validation performances in the object loss function during 31 epochs.

FIGURE 11
Training and validation performances in the box loss function during 31 epochs.

Frontiers in Built Environment
frontiersin.org validation stages. The corresponding performance of both training stages is presented in Table 1, which includes the numerical values of other complementary metrics. The results presented in Table 1 demonstrate the significance of hyperparameter tuning to achieve optimal model performance. The chosen hyperparameters successfully lead the model toward the correct prediction path, as evidenced by the low values of the loss functions. It is important to note that the model performance in terms of overall precision is 82.121%. The behavior of the hyperparameters is reflected in the accuracy of the detection objects, which will be presented shortly.
Real-time object detection systems are subject to variability, and their precision can be influenced by various factors. Nonetheless, it is a widely adopted practice to set the detection threshold at a minimum of 20% for real-time detection. Therefore, achieving a precision value of 82.121% is regarded as satisfactory.

Detection diagnostic
Following the fine-tuning process, a robust detector for concrete-corrosion damage was successfully developed. Figure 13 reveals the capability of the built detector to identify areas that have been damaged by corrosion. All the photographs exhibit some level of corrosion damage, and the reinforcing steel is exposed to the natural environment. This type of corrosion damage is generally caused by spalling, which is a prevalent pathology in concrete with poor quality control and unsatisfactory designs.
From the analysis of Figure 13, it is evident that the developed detector can effectively detect corrosion damage with high precision and confidence. The detected-element confidence levels are shown by the labels generated by the model, which consist of the term "corrosion damage" followed by a confidence value. The confidence value indicates the level of certainty with which the model identifies the element as a corroded area. A confidence value of 1.0 means that the model is 100% sure that the recognized object contains corrosion damage, and this trend is consistent across the other images.
In object detection with YOLO frameworks, it is common to observe confidence values that oscillate between 20% and 100%. This range is considered acceptable as detections are performed in real-time over dynamic or static elements, such as videos and images. Therefore, the confidence levels shown in Figure 13 are considered high and satisfactory.
The corrosion detector on concrete structures performs adequately by detecting the shape of steel bars, the coloration of corrosion (maroon, dark brown, and dark orange), and its combination with the gray color (the concrete color). However, the detector might fail in some object detection with corrosion merged with other elements or textures, as it is possible to notice in Figure 14.
The precision of the YOLOv3 model for object detection can be affected by many factors, which are as follows: • Training dataset size. The model precision improves with the size of the training dataset. Larger datasets provide more examples for the model to learn from. • Hyperparameter selection. Hyperparameters, such as the learning rate, batch size, and anchor boxes, can affect the training process and the precision of the final model. • Image quality. The quality of the input images, such as lighting conditions and image resolution, can affect the model's ability to detect objects accurately. • Choice of architecture. The YOLOv3 model architecture, such as the number of layers and the size of the input image, can affect the ability of the model to detect objects of different sizes and shapes.

FIGURE 12
Training and validation performances in the object loss function during 31 epochs. The limitations of the model developed in this study can be mitigated through several measures. One possible solution is to increase the size of the dataset used to train the model. This approach would provide the model with a more diverse dataset and allow it to learn more robust features to detect corrosion in concrete structures.
Additionally, using a newer version of YOLO that can handle higher-resolution inputs might also improve the model's accuracy. It will allow the model to capture finer details and enhance its ability to distinguish between corrosion and other textures or elements. These improvements will be addressed in future work to enhance the accuracy and robustness of the model. Figure 14 highlights some limitations of the model trained in this work. In the first photograph, although there is corrosion damage in almost all of the area, the model only detects a specific spot. It could be because the corrosion is merged with white elements, which were not common in the training set. It indicates the necessity of more diverse data to train the model. In the second photograph, the model fails on the delimitation at the lower box. It detects a small corroded bar, but the bounding box is too large for the detected element. In the third photograph, the model fails to detect two objects: the top one is a steel mesh that simulates a mesh of reinforcing steel on a concrete element, and the bottom one is a rock with a color similar to the corroded bar. It is crucial to mention that even the human eye may

FIGURE 13
Corrosion damage localization on concrete structures using the built detector.

FIGURE 14
False positives and no satisfactory detection on concrete structures.

Frontiers in Built Environment
frontiersin.org find the third photograph challenging. However, in general terms, the model performs well on various concrete structures, as shown in Figure 13.

Conclusion
Nowadays, technological innovations are being applied to various fields of science. However, some crucial areas of national development still have not fully benefited from these innovations.
This research provides an image dataset that contains presentations of corrosion damage on concrete structures. The dataset is used to train a state-of-the-art framework called YOLOv3, which can detect corrosion damage on concrete structures. The framework hyperparameters were modified through several experiments to achieve the best performance of the model.
It is crucial to highlight that the images need to be cleaned and filtered to achieve a homogeneous distribution in the dataset used to train the model. Although the data distribution was heterogeneous (with different scales, varying capture devices, and sizes) in this work, the adjustment of the hyperparameters was useful for reducing and avoiding overfitting problems.
The results provided by the model with their respective modifications show that it is possible to spot the damage corrosion on concrete structures accurately. However, to build a more robust model, it is suggested to build a larger dataset (a task that will be addressed in future work).
Today, detecting corrosion tasks on concrete structures using mobile devices is a real challenge because the corrosion problem presents a wide range of features to be considered. These features include different color levels of the corrosion process, the correct detection of the spalling problem, coloration of the concrete that has stored corroded steel, accommodating patterns of the reinforced steel, the reinforced steel exposed over different surfaces, impregnated dust over the steel (which is common in concrete structures), and many others.
The corrosion damage problem on concrete structures is a topic that needs to be taken seriously by governments because it affects the infrastructure durability, increases maintenance costs, denotes a lack of correct designs, and indicates poor construction processes. Therefore, the advancements in the detection of corrosion on concrete structures made in this research are significant and relevant because they demonstrate that detecting corroded bars on concrete is achievable.
Next, the authors conclude with a list of the main findings in this research: • The authors provide an image dataset for detecting corrosion damage on concrete structures using the YOLOv3 framework. • Frameworks such as YOLOv3 are able to detect objects in different scenarios. However, the training process of a custom dataset requires a deeper analysis. • Generally, when the model is trained as is (using the YOLO weights by default), the model suffers overfitting problems.
• The acquisition of the images for labeling and training the dataset must be in approximately the same conditions as possible. • An exhaustive hyperparameter fine-tuning might lead to the best performance of the model. • The advancements in the detection of corrosion on concrete structures made in this research contribute to the state-of-theart for addressing this problem.

Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://github.com/JaGuzmanT/ CONCORNET2023, CONCORNET2023 repository, Public access.