Identification of Grout Sleeve Joint Defect in Prefabricated Structures Using Deep Learning

A grout sleeve connection is a typical kind of joint in prefabricated structures. However, for construction and manufacturing reasons, defects in this kind of joint are usually inevitable. The joint quality of a prefabricated structure has a significant influence on its overall performance and can lead to structural failure. Due to the complexity of various types of materials used in grout sleeve connections, traditional non-destructive testing methods, such as Acoustic Emission (AE), Ultrasonic Testing (UT), Guided Wave Testing (GW), are facing great challenges. The recent development of deep learning technology provides a new opportunity to solve this problem. Deep learning can learn the inherent rules and abstract hierarchies of sample data, and it has a powerful ability to extract the intrinsic features of training data in complex classification tasks. This paper illustrates a deep learning framework for the identification of joint defects in prefabricated structures. In this method, defect features are extracted from the acceleration time history response of a prefabricated structure using a convolutional neural network. The proposed method is validated by vibration experiments on a half-scaled, two-floor prefabricated frame structure with column rebars spliced by different defective grout sleeves.


INTRODUCTION
In recent years, prefabricated structures have been widely used in construction, and grout sleeves are the most widely used connection type in these structures. Usually, the special grout used in these connections has good fluidity, early and high strength, and micro-expansion. However, most of these sleeves have complicated internal structures and complicated construction procedures, and this often leads to joint defects during the construction process. The underfilling of sleeve grouting is one of the typical defects that are seen in these connections. Such defects will seriously affect the mechanical properties of the joints and eventually affect the bearing capacity and seismic performance of the overall structure (Zhu et al., 2019). Conveniently and effectively detecting the fullness of sleeve grouting has always been a difficult problem in prefabricated concrete buildings. The ultrasonic wave method (Feng et al., 2020), the impact-echo method (Chou, 2019), the X-ray CT method (Zelelew et al., 2013;Gao et al., 2017), the damped vibration method (Zhu et al., 2018), the X-ray method (du Plessis and Boshoff, 2019), and the embedded wire-drawing method (Gao et al., 2019) are relatively mature detection technologies, but they are expensive and labor intensive, which means they are not conducive to the full detection of sleeve grouting. Because such defects are typically local, traditional global-based non-destructive dynamic testing methods also face great challenges when used with complex structures, because these technologies are unable to obtain effective local feature information from global dynamical data (Yoon et al., 2010). Moreover, with the increasing complexity of these connections, the influences of the different parts will increase, which will lead to these methods being unable to effectively extract feature information.
As a subset of machine learning in artificial intelligence (AI), deep learning involves networks that are capable of learning without supervision from original data. Deep learning technology has the potential to provide a new identification method in a case such as this because it can autonomously learn highly abstract features from original data.
Deep learning has enabled many practical applications of machine learning, and by extension the overall field of AI. An increasing number of researchers have tried to apply deep learning to identification problems in the field of civil engineering. LeCun et al. (1989) developed the first deep convolutional neural network (DCNN) to realize handwritten postcode identification (LeNet-5), using back propagation (BP), a supervised training algorithm. Cha et al. (2017) proposed a crack detection method for concrete structures based on a convolutional neural network (CNN) that used crack photographs taken under different conditions (such as different photo sizes, light sources, and shadows) for testing. Dorafshan et al. (2018) compared the performance of commonly used edge detectors and DCNNs in crack detection in concrete structures and proposed a method combining a DCNN and edge detectors that can reduce residual noise. Xu et al. (2019) proposed a fast region-CNN (R-CNN) method for identifying and locating multiple types of seismic damage to damaged reinforced concrete columns from images. The image data sets were established through on-site imaging, and these were expanded using a data-enhancement method. Based on fast R-CNN, Beckman et al. (2019) proposed a method for detecting concrete layer damage and studied the influence of the distance between the specimen and the sensor on the recognition accuracy. Chen et al. (2019) built a four-camera vision system that can obtain visual information about targets, including static objects and a dynamic concrete-filled steel tube specimen. Tang et al. (2019) presented a dynamic real-time detection method for examining surface deformation and full-field strain in recycled aggregate concrete-filled steel tubular columns. Mathematical models were proposed that combined the four-ocular visual coordinates and point-cloud matching. Abdeljaber et al. (2018) presented an enhanced CNN-based approach which only needs two measurement sets for structural damage detection. Dorafshan and Azari (2020) proposed onedimensional CNN which can successfully detect the subsurface defects of cement overlay bridge using impact echo data. Li et al. (2018) proposed a damage-identification method for bridges based on a CNN. The acceleration time-history responses from nine measuring points in a simply supported beam were used as input data to train and test the CNN, and the effects of different excitation sizes and different noise environments on the CNN identification results were analyzed. Xie et al. (2018) proposed a bridge damage-identification method based on a stacked noise-reduction autoencoder. The damage features of the bridge's acceleration response were extracted using multiple autoencoders, and the softmax function was used to identify bridge damage. Zhao (2019) proposed a blade damagerecognition method based on CNN, selecting a residual neural network (ResNet) as the basic model structure, and combined a batch gradient descent algorithm with CNN to improve learning efficiency. Lin et al. (2017) proposed a damage-recognition method based on CNN that automatically extracts features from time-domain response data from the structure. A numerical test of a simply supported Euler-Bernoulli beam was designed, and the CNN was trained and tested using the response time history of the beam under different working conditions. Abdeljaber et al. (2017) developed a structural-damage-recognition system based on a one-dimensional CNN that integrated feature extraction and classification into a complete learning module, thereby realizing vibration-based damage detection and real-time damage location. Pathirage et al. (2018) proposed a deep learning network based on autoencoders that can identify structural damage through vibration responses. The natural frequency, vibration mode, and other vibration characteristics were taken as inputs, and structural damage was the output. Lee et al. (2018) compared DNNs with different hidden layers, activation functions, and optimization algorithms, and tested the performance of different combinations. Khodabandehlou et al. (2019) established an 11-layer, two-dimensional CNN that can extract features from the acceleration response time history of a structure, making it possible to classify bridge damage using acceleration measurements. Duan et al. (2019) proposed a bridge damage detection method based on CNN. The acceleration response time history and Fourier spectrum were compared as a training data set for the CNN. Gulgec et al. (2019) trained and compared 50 CNNs with different learning rates, convolutional layers, and fully connected layers, and finally proposed a CNN approach for structural damage detection and localization. Wang et al. (2019) proposed a dual-path network composed of a ResNet and a densely connected convolutional network. A spatial timefrequency data set was established using multi-dimensional vibration signals, and different railway events were classified by monitoring data that contained environmental noise. Kim and Sim (2019) proposed a framework composed of a fast R-CNN and a region-suggestion network based on deep learning that can automatically extract peaks in frequency-domain pattern recognition. Tang et al. (2020) presented an overview of recognition and localization methods for vision-based fruitpicking robots. Yu et al. (2019) proposed a deep CNN-based method for the identification of damages in buildings. The proposed method can automatically extract high-level features from raw signals in time domain, and the performance of the method is verified by a five-level benchmark building. Avci et al. (2020) introduced the recent applications of deep learning methods which is used in vibration-based structural damage detection in the area of civil structures.
Convolutional neural networks have been widely recognized as powerful tools to deal with problems such as image identification. Identification methods based on image processing need to collect response images as input data. While it is usually difficult to collect images of defects in an actual prefabricated structure, acceleration response data can be obtained from such a structure relatively easily. However, effectively utilizing this sensing data for structural response modeling to identify defects remains a challenge.
In this paper, we propose an approach for sleeve joint defect identification in prefabricated frame structures using a CNN with a customized architecture. The proposed method is verified by dynamic tests of a half-scaled, two-floor prefabricated concrete frame structure. Structural acceleration response sensing data are taken as input samples. The existence of defects, and their location and degree, are identified, and the accuracy is evaluated using a confusion matrix (Thongkam et al., 2008).

Grout Sleeve Connection
For prefabricated concrete structures, grout sleeves are the most widely used rebar connection type. As shown in Figure 1, a full grouting sleeve connection mainly includes three parts: a sleeve, ribbed steel bar, and grout. In this paper, the material used for the sleeve was nodular cast iron and the ribbed steel bar was made from steel with a standard value of yield strength of 400 MPa and a modulus of elasticity of 200 GPa. The rebar diameter was 12 mm. According to the Technical Specification for Precast Concrete Structures, the outer and inner diameters of the sleeve were set as 44 and 34 mm, respectively, and the length was set as 250 mm (China Institute of Building Standard Design and Research, 2014). The grout was made from cement as the basic material, and this was mixed with fine aggregate, a concrete admixture, and other materials. After stirring with water, this mixture had good fluidity, early and high strength, and micro-expansion.
To form the joints, two steel bars were inserted into the sleeve, one from each end. The upper and lower longitudinal ribs extended into the sleeve by 120 and 110 mm, respectively.
The special grout was then poured in via the grouting hole and flowed out from the overflow slurry hole until the sleeve was filled. In this situation, the hardened grout grips both the steel bar and the sleeve, and due to its micro-expansion and high strength characteristics, the positive force between the sleeve and the steel bar is strengthened (Zheng et al., 2015).
The construction of a full grouting sleeve is very complicated, and during the construction process, sleeve defects, such as grout leakage, eccentricity of the steel bars, incomplete fill of grout, and peeling of the grout from the connecting members are usually inevitable. Grout sleeve bond failure is a typical but undesirable failure mode in grout sleeve joints. Grout sleeve defects will seriously affect the mechanical properties of the joint, becoming  Frontiers in Materials | www.frontiersin.org a potential risk to the structure. Figure 2 shows an incomplete fill of grout sleeve defect caused in on-site construction, and it is the most common defect type in actual engineering.

CNN-Based Defect Identification Method
This study presents a defect identification method for prefabricated concrete structures based on a deep convolutional neural network. This network can directly extract defect features from the dynamic response of the fabricated concrete structure. An overview of the proposed method is shown in Figure 3 and is described briefly as follows: (1) An experiment is conducted to obtain acceleration response data for the fabricated concrete structure; (2) the collected dynamic response data are preprocessed, data sets (training, validation and test sets) are established, and the samples are labeled; (3) a deep CNN, as shown in Figure 4, is trained on the training data set; (4) the loss of validation during the training process was calculated for each epoch to test whether it is over fitting; (5) the test set is FIGURE 3 | The process of structural defect identification based on a CNN. Here, defect probability represents that the output was mapped into a probability distribution in the range [0, 1] through the Softmax function.
FIGURE 4 | Basic framework of the CNN. Here, C represents a convolution layer, S represents a pooling layer, FC represents the fully connected layer, N represents the number of nodes in the input sample, and M represents the number of nodes in the output.
Frontiers in Materials | www.frontiersin.org used to verify the feasibility and accuracy of the CNN; (6) Defect location and degree were identified.
The deep learning network is established based on LeNet-5 (LeCun et al., 1989), a CNN that is famous for its simple structure and high efficiency. The Matlab toolkit DeepLearnToolbox was employed to establish the proposed CNN architecture. DeepLearnToolbox is the most widely used deep learning toolkit in Matlab, and the CNN in this toolkit was built by Rasmus Berg Palm on the basis of LeNet-5. In a recognition test of the Modified National Institute of Standards and Technology database, an error rate of 1.22% was obtained. The proposed deep CNN basic framework is shown in Figure 4. It can be seen that the CNN consisted of an input layer, three alternating convolutional and pooling layers, a fully connected layer, and an output layer. The components of the CNN and the selection of the activation function and optimization algorithm are described in the next sections.

Convolution Layer
The convolution layer is the core of the CNN, and its purpose is to use a convolution kernel to extract signal features. Generally, convolution is a mathematical operation on two real variable functions. The convolution expression of functions x and w can be calculated as The discrete form of this convolution operation can be defined as In general, the first parameter in the convolution operation is called the input, the second parameter is called the convolution kernel, and the output is called a feature map. According to the convolution expression, each element in the output is obtained by the weighted addition of the elements of the corresponding block in the input, and the weight value is determined by the convolution kernel. The convolution kernel therefore plays the role of filtering or feature extraction in the convolution. Convolutional layers have powerful featurelearning capabilities. In general, deep networks can continuously and iteratively extract higher-level features from the features of the underlying network.

Activation Function
Generally, a convolutional layer is followed by a non-linear activation function. The activation function maps the linear input data into non-linear features through its own non-linear features, and the features obtained in the convolutional layer are filtered. Common non-linear functions in convolutional neural networks include the sigmoid, tanh, softsign, and ReLU functions (Zhou and Mi, 2017). Generally, the performance of the ReLU function is considered best. In this study, the defect identification accuracies obtained using tanh and ReLU functions with different learning rates of 0.005, 0.01, 0.02, and 0.05 were compared using experimentally obtained sensing acceleration response data. The results are shown in Table 1, and these show that the identification accuracy of the tanh function in this example is significantly better than that of the ReLU function. Therefore, the tanh function was selected as the activation function of the convolutional layer in the CNN. An expression for the tanh function is shown in Equation (3).

Pooling Layer
Pooling is mainly used to filter redundant features and reduce the number of parameters, avoiding over-fitting. There are two common pooling operations: maximum pooling and average pooling. Maximum pooling takes the maximum value of each pooling window and average pooling takes the average value of each pooling window. In this work, the average pooling method was used. The average pooling operation is shown in Figure 5.

Fully Connected Layer and Softmax Classifier
The fully connected layer is connected to all the activations in the previous layer. In this work, all the elements in the feature map of pooling layer S6 were processed into an M-dimensional column vector, multiplied by weighting coefficients, and added with corresponding offsets. The softmax function was then used to calculate the final output of the CNN. The softmax function can be expressed as where S i is the probability that the current input is the i-th category, e is Euler's number, V i is the i-th output of the previous Frontiers in Materials | www.frontiersin.org output unit, and C is the total number of categories. As shown in Equation (4), the softmax classifier maps the output of the previous output unit to the interval (0, 1), and this represents the relative probabilities of different categories so that the input samples can be classified according to their probability.

Batch Gradient Descent Algorithm
In the calculation process of the backpropagation algorithm, the parameters in the CNN need to be optimized to minimize the cost function to obtain an optimal solution. In this work, a crossentropy cost function was used. The cross-entropy cost function can be expressed as where E represents the cost function value, N represents the total number of samples, y (i) represents the true output (label) of the i-th sample, and o (i) represents the predicted output of the i-th sample. The small-batch gradient descent algorithm was used as the optimization algorithm. As shown in Table 1, in this example, the identification accuracy is better when the learning rate is 0.01, so the learning rate was set as 0.01 in the batch gradient descent algorithm.

Experimental Verification
To verify the accuracy and feasibility of the proposed deep learning method, dynamic excitation tests were carried out on a prefabricated half-scaled, two-floor concrete frame with different defective columns.

Experimental Model
A half-scaled, two-floor prefabricated concrete frame structure was constructed. This mainly comprised four parts: precast columns, precast beams, precast foundation beams, and a poured concrete slab. The beams and columns were made from concrete, the standard value of compressive strength of concrete cube FIGURE 7 | The grouting process.   is 30 N/mm. Longitudinal reinforcements and stirrups were made from hot-rolled ribbed bars with a standard value of yield strength of 400 MPa and a modulus of elasticity of 200 GPa. The full grouting sleeve connection method was used to splice the internal reinforcements in the foundation beams and columns. A plan of the column net in this experimental model is shown in Figure 6A, and elevation views are shown in Figure 6B. Total 48 grouting connections were used in this frame structure model. The grouting process is shown in Figure 7.
In this test, precast beam and column members were manufactured by professional prefabricated assembly member manufacturing plants, and they were assembled in the laboratory of Tongji University after curing. The frame structure model after assembly is shown in Figure 8.

Defect Setting
In these experiments, defects in grout sleeve joints are introduced through incomplete grouting. To avoid structural damage to or collapse of the structure, the defects were arranged in the precast columns of the second floor instead of the first floor. As shown in Figure 9, for comparative analysis, defects were set only on one of the columns on either side of a particular beam. Excitation points were arranged across the span of the beams. Thus, for the same excitation, both defective column vibration responses and non-defective column vibration responses could be obtained. As shown in Table 2, there were seven working conditions corresponding to seven excitation points. Taking working condition 5 as an example, 14 measuring points were arranged for each working condition. The locations and numbers of excitation points and measuring points are shown in Figure 10.

Data Collection
Excitation was applied by vibration exciters, and acceleration sensors were arranged at the measuring points to collect acceleration responses. The excitation force was 200 N and the acquisition frequency was 1,024 Hz. Each excitation point is excited once, and the duration of excitation is 60 s.
According to the acceleration time-history curve analysis, the amplitude at each measuring point in each working condition was different. For the stability of the convolutional neural network during the training process, a normalization layer is added to the proposed network (Dorafshan and Azari, 2020). The amplitudes of the acceleration time-histories are normalized as where x represents the original acceleration time history, y represents the acceleration time history after the amplitude is normalized, and i is the index of each measuring point. The acceleration time history from each measuring point was taken as the input sample. For each working condition, 14 acceleration time histories were obtained. As shown in Figure 11, the 60 × 1,024 signals of each measuring point were divided into 5,000 parts. Each part includes 1,024 signals and adjacent parts have an overlap of 1,012 signals. The first seven acceleration time histories were collected from the columns and beams without defects and constituted the non-defective sample; the last seven acceleration time histories were collected from the defective columns and beams and constituted the defective sample. That is, there are in total 5,000 non-defective samples and 5,000 defect samples for each working condition, and each sample contains 7 × 1,024 acceleration signals. Sixty percentange of the total samples were randomly selected as the training set, 20% of the samples were selected randomly as validation set, with the remaining 20% constituting the test set.
The number of training set samples generated in this experiment was relatively small. The batch size was therefore set to five to ensure that there were a sufficient number of samples for training; the epoch number was set to 20. The number of iterations can be calculated as where IN represents the number of iterations, EN represents epoch number, TS represents the training set size, and BS represents the batch size.

Defect Location Identification
The defects and their locations were identified for the samples in each working condition. The samples were labeled with a vector consisting of eight elements. The first seven elements represent the probability that a defect is located at each of the seven corresponding measurement points. The eighth element represents the probability that the sample is a non-defective sample. In this experiment, a defect was set at measuring point 4, so the label of the defective sample was [0, 0, 0, 1, 0, 0, 0, 0]; the label of the non-defective sample was [0, 0, 0, 0, 0, 0, 0, 1]. Thus, the sample is identified as a defective sample when the fourth element in the sample output vector is > 0.95, the sample is identified as a non-defective sample when the eighth element in a sample output vector is >0.95, and the identification is invalid when no element in a sample output vector is > 0.95. Based on the above label setting, the detailed parameters for each layer in the CNN are shown in Table 3. As shown in Figure 12, during the training process, the loss function in the CNN gradually decreased with an increasing number of iterations and converged to a stable level for each working condition. As shown in Figure 13, the loss of validation during the training process was calculated for each epoch to test whether it is over fitting, and the training stopped in the 20th epoch (the loss of validation was not changed significantly). The identification accuracies for each working condition during the training process of the CNN are shown in Figure 14. It can be seen that during the CNN training process, the identification accuracy rate of samples in the seven working conditions continuously increased with the growth of the epochs, and finally stabilized at 100%. The four longitudinal working conditions (conditions 1-4) converged faster than the three lateral working conditions (conditions 5-7).
For each working condition, 2,000 test samples were sequentially input into the trained convolutional neural network. The test results showed that the identification accuracies of the seven test sets were all 100%. This means that the proposed method can successfully identify defects and their locations.

Defect Degree Identification
For defect degree identification, the precast defective columns of working conditions 5, 6, and 7 have covered the defect degrees of 25, 75, and 50% with the same boundary conditions in the test. So these three working conditions (5, 6, and 7) were taken as this section's research object. There were in total 15,000 samples, 60% of these were randomly selected as the training set, 20% of the samples were selected as validation set, with the remaining 20% constituting the test set. The batch size was set to five, and the epoch number was set to 20. The sample labels were vectors consisting of three elements. The first element represents the defect degree being 25%, the second element represents the defect degree being 50%, and the third element represents the defect degree being 75%. For example, a sample label with 25% defect degree would be [1, 0, 0].
To realize the identification of the defect degree, the difference between the sample output vector and the sample label vector is calculated using the Euclidean distance (Schnitzer et al., 2012). The Euclidean distance ρ between two points (x 1 , x 2 , . . . , x n ) and (y 1 , y 2 , . . . , y n ) in n-dimensional space can be calculated from ρ = (x 1 − y 1 ) 2 + (x 2 − y 2 ) 2 + · · · + (x n − y n ) 2 A confusion matrix is used to measure the classification and prediction ability of the model (Thongkam et al., 2008). The identification result of a sample can be one of four types, as shown in Table 4: true positive (TP), false positive (FP), false negative (FN), and true negative (TN). The precision ratio (p) and recall ratio (r) of the confusion matrix can be calculated from During the training process, the losses of train and validation are shown in Figure 15. Three samples (from three working conditions) were randomly selected from the test set. The Euclidean distances between the output vectors and the sample labels are shown in Table 5. Test samples were sequentially input into the trained convolutional neural network, and the confusion matrix of the identification results is shown in Table 6. This table shows that both the precision ratio and the recall rate for each sample type were 100%. The evaluation target of precision ratio is the prediction results, the number of positive samples among the samples whose prediction is positive. The precision ratio is 100%, indicating that the prediction results of all samples with different defect degrees are positive. The evaluation target of recall ratio is the original samples, the number of positive examples that are predicted correctly. The recall ratio is 100%, indicating that all the samples with different defect degrees have been predicted correctly. Table 6 shows that the CNN is stable and has good identification performance for structural defect identification in a prefabricated frame. There was no significant difference between the recall rate and the accuracy rate, indicating that the CNN does not show bias toward different defect types.

CONCLUSIONS
This paper presented a deep-CNN-based method for identification of sleeve joint defects in prefabricated concrete frame structures. The proposed method uses LeNet-5 as the basic framework and refers to the CNN in the DeepLearnToolbox.   Non-destructive dynamic tests on a half-scaled, two-floor prefabricated concrete frame structure were carried out. The CNN was trained using only the collected sensing acceleration responses to extract features for sleeve joint defect identification. In the experiment, there were seven working conditions due to the different defect degrees and boundary conditions. For each working condition, 5,000 non-defective samples and 5,000 defective samples were collected, and 60% of these were selected randomly as the training set, 20% of these were selected as validation set, while the remaining 20% were used to test the CNN. The defects and their locations were identified using the CNN, and the identification accuracy was 100% in each working condition. The Euclidean distances between the output vectors and the label vectors were calculated to determine the defect identification result, and a confusion matrix was used to judge the identification accuracy. Defects with different degrees were identified in lateral working conditions 5, 6, and 7. The results showed that both the precision ratio and the recall rate were 100%, and the proposed method did not show bias toward different defect degrees. Overall, the proposed method was found to be very effective in joint defect identification in prefabricated concrete frame structures in the experimental situation examined in this paper.