Large-factor Micro-CT super-resolution of bone microstructure

Background: Bone microstructure is important for evaluating bone strength and requires the support of high-resolution (HR) imaging equipment. Computed tomography (CT) is widely used for medical imaging, but the spatial resolution is not sufficient for bone microstructure. Micro-CT scan data is the gold standard for human bone microstructure or animal experiment. However, Micro-CT has more ionizing radiation and longer scanning time while providing high-quality imaging. It makes sense to reconstruct HR images with less radiation. Image super-resolution (SR) is adapted to the above-mentioned research. The specific objective of this study is to reconstruct HR images of bone microstructure based on low-resolution (LR) images under large-factor condition. Methods: We propose a generative adversarial network (GAN) based on Res2Net and residual channel attention network which is named R2-RCANGAN. We use real high-resolution and low-resolution training data to make the model learn the image corruption of Micro-CT, and we train six super-resolution models such as super-resolution convolutional neural network to evaluate our method performance. Results: In terms of peak signal-to-noise ratio (PSNR), our proposed generator network R2-RCAN sets a new state of the art. Such PSNR-oriented methods have high reconstruction accuracy, but the perceptual index to evaluate perceptual quality is very poor. Thus, we combine the generator network R2-RCAN with the U-Net discriminator and loss function with adjusted weights, and the proposed R2-RCANGAN shows the pleasing results in reconstruction accuracy and perceptual quality as compared to the other methods. Conclusion: The proposed R2-RCANGAN is the first to apply large-factor SR to improve Micro-CT images of bone microstructure. The next steps of the study are to investigate the role of SR in image enhancement during fracture rehabilitation period, which would be of great value in reducing ionizing radiation and promoting recovery.

provides inspiration and innovative ideas for SR [20]. The first DL-based SR application was in 2014 and Dong et al. proposed super-resolution convolutional neural network (SRCNN) with a three-layer convolutional network [21]. Since SRCNN, DL has provided new routes toward the development of high-performance SR. Later, the performance of the SR algorithm became even better with the development of DL, such as the improvement of the upsampling module [22,23], new backbone proposed [24][25][26][27][28], and modification of the loss function [29]. The above DL-based methods use the mean square error (MSE) or L 1 as the loss function to improve the peak signalto-noise ratio (PSNR), and continuously improve performance. However, some studies pointed out that the PSNR-oriented method results in a loss of high-frequency textures and inconsistency with human visual perception. Recently, the popularity of generative adversarial networks (GAN), which enables CNNs to learn feature representations from complex data distributions, has made it possible to solve the above problem. GAN-based methods have achieved good results [30][31][32].
In the area of medical imaging, DL has been used successfully in every aspect, such as disease classification, outcome prediction, medical image segmentation and much more [33][34][35][36]. DL-based SR algorithms have made progress in several medical imaging modalities. Zhang et al. proposed a hybrid model to improve CT resolution [37]. Chen et al. proposed a 3D densely connected SR model to restore HR features of brain magnetic resonance image (MRI). Dong et al. proposed a multi-encoder structure based on structural loss and adversarial loss to magnetic resonance spectroscopic imaging (MRSI) resolution enhancement [38]. And SR also has made progress in the pre-clinical research. You et al. proposed GAN-CIRCLE to enhance the spatial resolution of Micro-CT scans of bone [39]. Xie et al. developed auto-encoder structure to reconstruct HR bone microstructure [40].
However, there are still the following important challenges and inherent trends: 1) Most of the previous studies focus on the small-factor SR (2× and 4×) [41]. Large-factor image SR is likely to be required in the field of pre-clinical and clinical imaging. However, the smaller the factor, the lower the difficulty for SR. Large-factor image SR requires more effective approaches [19]. 2) The corruption of Micro-CT images is unknown [5]. It is certainly quite different from the LR images which are obtained from HR images downsampling [20]. 3) Reconstruction images require a balance between the accuracy of the reconstructed image and preserving more high frequency textures [19].
Motivated by the aforementioned drawbacks, in this study we made major efforts in the following aspects. First, we determined Frontiers in Physics frontiersin.org to use Micro-CT images of the rat fracture models. Previous research has established that Micro-CT images with voxel sizes of 10 µm are sufficient to observe fractures in rats [6,8]. To achieve the large-factor SR conditions, the HR and LR voxel sizes are 10 μm and 80 µm in this research. Second, Our HR and LR images are real data from Micro-CT, which enables the SR model to learn about the actual corruption of Micro-CT. Finally, we propose a new GAN combining Res2Net [42] and residual channel attention network (RCAN) [28], which named R2-RCANGAN. The generator network R2-RCAN increases the network width and has the ability of multi-scale feature extraction while considering the advantages of each channel. The U-Net discriminator with spectral normalization (SN) has a more stable performance [32]. And we adjust the weight of the loss function, which enable the recovered images have a better perceptual quality and the accuracy of the pre-clinical images. Our R2-RCANGAN achieves the best results over the other classical SR models. To summarize, the specific objective of this study is to develop an 8×SR model based on Micro-CT images. The following innovative points introduced in the paper are worth mentioning: 1) The large-scale (8×) SR is unusual in pre-clinical imaging or clinical imaging.
2) Our HR and LR images are real data from the device so that the SR model learns about the actual Micro-CT corruption. 3) We propose a new network structure R2-RCANGAN. Our generator network R2-RCAN sets a new state-of-the-art in terms of PSNR. In addition, we combined a stable U-Net discriminator and a loss function with appropriate weights. These significantly improve the perceptual quality of the reconstructed images. R2-RCANGAN maintains as much accuracy as possible in pre-clinical images with good perceptual quality.

Dataset preparation
Micro-CT with high spatial resolution offers important support for imaging small animals. However, a major problem with Micro-CT is that it suffers from poor temporal resolution. One Micro-CT scanning cycle is performed throughout numerous respiration cycles for living animal imaging [43]. This study applied live rats, and even when the position of the rats was strictly maintained, the respiratory motor of the rats caused the HR and LR images to mismatch. To solve the problem, this research applies the Frontiers in Physics frontiersin.org feature point matching approach to make LR-HR image pairs, as shown in Figure 1.
In this work, we use the A-KAZE algorithm to detect feature points and the Brute-Force approach to match feature points [44]. After matching feature points, we select two pairs of feature points a-A and b-B (points a and b from the LR image, points A and B from the HR image). Connecting ab and calculating the angle with the horizontal direction by Eq. 1: where (x a ,y a ), (x b ,y b ) are the coordinate points of a and b. Calculating the angle between AB and horizontal is the same as described above. Rotate the LR image in the same direction as the HR image according to the angle difference between ab and AB. After rotating, we cut HR and LR separately with the feature point pair as the center to obtain the corresponding LR-HR image pair. With the above operations, a 40 × 40 pixels sub-image is cropped from the LR image, and a 320 × 320 pixels sub-image is cropped from the HR image. This not only solves the image motion shift, but also crops the image to optimize deep learning training.

Generator network
As shown in Figure 2, our generator network named R2-RCAN is an enhanced model of RCAN [28]. While retaining the

Channel attention (CA)
CA is an important structure of the RCAB. CA can adaptively adjust the feature weight of each channel to make the network focus on more useful feature channels. The structure of CA is shown in Figure 2A. Initially, CA is a global average pooling layer. Next, two 1 × 1 convolutional layers are down-scaled and up-scaled to obtain the weight of each channel. The last step of CA is to multiply the channel weight with the input feature maps to obtain new feature maps.

RCAB
The principle of RCAB is to add CA to the residual block. The structure of the RCAB is shown in Figure 2B, it contains two convolutional layers and one CA layer. Eq. 2 represents the input after two layers of convolution: where i and j represent the jth RCAB of the ith group, F i,j−1 means input and X i,j means output. W 1 i,j and W 2 i,j represent two stacked convolutional layers, and ϕrepresents the ReLU activation function. Obtain the result X i,j and then input to the CA layer as Eq. 3: where F i,j is the output of this layer, and R i,j means the CA. The RCAB-group contains 20 RCAB layers, one convolutional layer and short skip connection (SSC).

Res2-group
Res2-group is stacked by Res2block [42]. Rse2block is a novel backbone in Res2Net, and its structure is shown in Figure 2C. It represents multi-scale features and expands the range of receptive fields for each network layer. The Res2block used in this study integrates the Squeeze-and-Excitation (SE) block [45]. The SE block establishes the channel dependency relationship and adaptively recalibrates the channel characteristic response. This is similar to the previous RCAB in terms of channel response, so the Res2-group in this study is suitable for RCAB-group. As the network depth increases, the receptive field of the network expands, multi-scale features are expressed, and the network width expands. Res2-group contains five Res2blocks and SSC.

Discriminator network
We used the U-Net discriminator with spectral normalization (SN) [32]. The U-Net discriminator structure is shown in Figure 3. It contains convolution layers, SN layers and SSC. SN prevents GAN training instability caused by real CT corruption. Studies have demonstrated that SN is beneficial to alleviate the over-sharp and annoying artifacts introduced by GAN training.

Loss function
Our loss function contains three parts: L 1 loss, GAN loss and perceptual loss. L 1 loss is the content loss that evaluates the 1norm distance between the recovered SR image and the HR image, which is as in Eq. 4: where n is the number of samples, H and W are the length and width of the image, respectively. I is the image pixel value. GAN loss needs to calculate discriminator loss, and the discriminator loss is as in Eq. 5: where we used the relative discriminator [46], D Ra (x HR , x SR ) indicates the degree to which HR images are more realistic than SR images, and D Ra (x SR , x HR ) indicates the degree to which SR images are not as realistic as HR images. The generator loss is as in Eq. 6: In addition to the above two loss functions, there is also a perceptual loss functionL percep [32]. It inputs the SR and HR into a pre-trained VGG19, and calculates the MSE between the feature maps after the fourth convolutional layer in the VGG. The total loss is as in Eq. 7: where α, β, and δ are constants.

Datasets
All procedures performed in studies involving animals were in accordance with the ethical standards of the national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
The data (HR and LR) used in this study were derived from 10 living rat fracture models (with a median age of 55 days and a median body weight of 200 g). All rat models for ankle fracture were established by professional operators (The depth of the fracture reached the trabecular bone, but the medial malleolus artery was not damaged.), and scanned the tibia to the ankle bone. We use the same scanner (Micro-CT: Bruker SkyScan   ,000 slices at 10 μm voxel size, and the LR image size 160 × 160 pixels, 500 slices at 80 μm voxel size. So we select one HR image for every eight HR images to match the LR image (500 pairs). By making LR-HR image pairs in Section 2.1, HR images and LR images are cropped to 320 × 320 pixels and 40 × 40 pixels. We screened 1279 LR-HR image pairs (2-three pairs of LR-HR images were cropped out of each of the 500 pairs), of which 1000 have good image quality. And the data were split into training set (80%), validation set (10%) and test set (10%), where the training and test sets are from different rat models.

Training settings
The generation network R2-RCAN is made up of 10 RCABgroups and 10 Res2-groups that are interconnected alternately. Each RCAB-group contains 20 RCAB layers, and each Res2group contains five Res2blocks. The convolution kernels size for FIGURE 5 Box plot of PSNR and PI on the test set. The R2-RCANGAN (red) has good reconstruction accuracy and perceptual quality. In terms of PSNR, R2-RCAN (blue) is the best, but it has a poor PI performance. In terms of PI, ESRGAN (green) is the best, but its PSNR is too low. (A) PSNR on the test set, (B) PI on the test set.

FIGURE 6
PSNR-PI plane on the test set. The PSNR-oriented methods such as R2-RCAN (blue) have very high reconstruction accuracy but very poor perceptual quality. ESRGAN (green) has the best perceptual quality, but the reconstruction accuracy is too low. Only R2-RCANGAN (red) does not fall into either extreme in the scatter plot, balancing reconstruction accuracy and perceptual quality.  Visual comparison of different methods. R2-RCAN (blue) has the highest reconstruction accuracy, but loses a lot of image texture. ESRGAN (green) had the best perceptual quality, but showed many erroneous textures and the worst reconstruction accuracy. R2-RCANGAN (red) has a good performance in both reconstruction accuracy and perceptual quality. Frontiers in Physics frontiersin.org 08 Figure 4, there are feature maps for each block of VGG19. The feature maps of the conv1_2 and conv2_2 contains a large number of high frequency textures such as bone trabeculae, which is important high frequency information needed for our SR model. Thus, we attribute a higher weight to the first two feature maps.
Our network is trained by Adam optimizer with β1 = 0.9 and β2 = 0.999. The initial learning rate is set to 1 × 10 -4 and then decreases to half every 10 5 iterations of back-propagation. We train R2-RCAN for 300 K iterations while training R2-RCANGAN for 150 K iterations. Our model is trained by a GeForce RTX 2080 Ti.
To ensure the accuracy of pre-clinical images while getting a better perceived quality. We validated the SR performance in terms of two widely-used image quality metrics: PSNR and perceptual index (PI) [47]. PSNR as a typical distortion measure is used to evaluate the reconstruction accuracy of SR. A higher PSNR means better reconstruction accuracy. The calculation of PSNR is based on Eq. 8.
where the PSNR is derived from MSE between the HR image I HR and the SR image I SR . Since PSNR and MSE are closely related, it is reasonable to anticipate that a model trained with the MSE loss will have high PSNR. Even though higher PSNR typically indicates higher reconstruction accuracy, it just considers the per-pixel MSE, which makes it fails to capture the perceptual differences. To remedy this shortcoming of PSNR, we use the metric PI. And PI is a non-reference measure proposed by the 2018 PIRM Challenge to evaluate the perceptual quality of SR [47], which is calculated by Ma's score [48] and natural image quality evaluator (NIQE) [49]. This is a new image quality evaluation standard, which has been greatly promoted and used in recent years. A lower PI represents a better perceptual quality. PI is as in Eq. 9: Table 1 and Figure 5 show the summary of PSNR and PI for each method. In terms of PSNR, our proposed R2-RCAN sets a new state of the art in the rat fracture dataset. In terms of PI, ESRGAN gets the best score. But the above does not mean that they are the best algorithms, because reconstruction accuracy and perceptual quality are at odds with each other. As shown in Figure 6, PSNR-PI plane allows a better weighing of reconstruction accuracy and perceptual quality. The PSNR-oriented methods have very high reconstruction accuracy but very poor perceptual quality. ESRGAN has the best perceptual quality, but the reconstruction accuracy is very low (PSNR is even lower than the bicubic interpolation in Table 1). Only our proposed R2-RCANGAN achieves the not bad scores in terms of PSNR and second best values in terms of PI. It maintains as much accuracy as possible in pre-clinical images, while having a good perceptual quality. R2-RCANGAN loses 4 percent of PSNR while improving PI by 32 percent compared to R2-RCAN.
In this study, several samples were selected to compare the SR results of different methods. Figure 7 confirms that the PSNRoriented methods (SRCNN, EDSR, RRDB, RCAN, and R2-RCAN) produced blurry results, while the GAN-based methods (ESRGAN and R2-RCANGAN) restored more anatomical contents and was suitable for human perception. PSNR-oriented methods may fail to recover some fine structure for fracture evaluation, such as shown by blue boxes in Figure 7A. In Figures 7B,C, green boxes mark trabeculae bone. These results indicate that PSNR-oriented methods can significantly suppress the noise and artifacts. However, it has poor image quality as judged by a human observer because it implies that noise impact is independent of local image properties, whereas the human visual system's sensitivity to noise is reliant on local contrast, intensity, and structural variations. Most importantly the texture of the bone trabeculae has been smoothed out as noise during this large-factor SR reconstruction. It can also be observed that the GAN-based models introduce false textures and strong noise. In particular, in Figure 7B, the trabecular is incorrect (green box) and generates additional noise (yellow arrow) on the result of ESRGAN. And our proposed R2-RCANGAN is capable of maintaining high-frequency features to recover more realistic images with lower noise compared with ESRGAN. In terms of PSNR, R2-RCANGAN is also not significantly lower than the PSNR-oriented methods, and it both obtains the pleasing results in terms of PSNR and PI. R2-RCANGAN generates more visually pleasant results with high reconstruction accuracy than the other methods.

Model performance based different training data
We analysed the performance of R2-RCANGAN based on different training datasets. There are two groups of paired LR and HR images: 1) LR and HR images are real data scanned from Micro-CT. 2) LR image is bicubic downsampled version of HR image. This downsampling approach to obtain paired data commonly used in the SR studies. We trained R2-RCANGAN Frontiers in Physics frontiersin.org with the same hyperparameter settings using the two training sets described above. We compared the performance of the two models on the test set. The quantitative results are in Table 2.
The results demonstrate that the R2-RCANGAN trained on real data achieves the higher scores using the evaluation metrics. We present typical results in Figure 8, which contains the trabeculae and the fracture site. These results demonstrate that model based on downsampled data perform much less well than model based on real data. The quality of the reconstructed images of both the trabeculae (green box in Figure 8A) and the fracture sites (blue box in Figure 8B) is very bad, even similar to LR images. Surprisingly, the difference in the PSNR values in Table 2 is not as pronounced as shown in Figure 8. The above scenario could be attributed to LR-HR image pairs based downsampled are better matched and the do not suffer from errors in the position of the real LR-HR image pairs. This makes it easier to compare reconstruction results to show the difference between the two LR-HR image pairs. In summary, for the informants in this study, the commonly SR models used downsampled training data are not suitable for medical imaging devices. In the field of medical research, realistic paired images are essential.

Discussions
Prior studies that have noted the bone microstructure is a significant predictor of osteoporosis and fracture risk [1,50,51]. However, The spatial resolution of the best CT imaging technologies is only comparable to or slightly higher than human trabecular bone thickness [52], resulting in fuzzy representations of individual trabecular bone microstructure with significant partial volume effects, which add significant errors in measurements and interpretations. Thus, Micro-CT is suitable for imaging bone microstructure. And it is well known that ionizing radiation is harmful to animals and humans [10,11,53]. Even so, for a more accurate medical diagnosis, we need imaging equipment that can cause damage to the body. We are committed to reducing ionizing radiation while maintaining the resolution of Micro-CT images. The first question in this study is determining the SR factor. Previous pre-clinical or clinical image studies have focused on small-factor SR [39,40]. Thus, the specific objective of this study is to establish an ×8 SR model and it contributes to the development of the large-factor SR for medical imaging.  In previous SR studies, LR image is downsampled version generated from HR image [19][20][21]. In order for our SR model to learn the real corruption of Micro-CT, LR images are obtained from the equipment scans. Given that our samples are live rats, the LR and HR images do not match due to the offset position of the samples. We use feature point detection and matching algorithms to create LR-HR image pairs. The above image pairs support our model to learn the real Micro-CT image corruption. And we also trained a model based on downsampled data, which was much less effective than the model trained on real data. These results further support the importance of real data for medical SR.
Our SR model focuses not only on image reconstruction accuracy, but also on perceptual quality. A negative relationship between reconstruction accuracy and perceptual quality has been reported in the literature [54]. In this study, the PSNR is used to assess accuracy and the PI to assess perceptual quality. R2-RCANGAN combines Res2Net, RCAN, and U-Net discriminators. Its generator R2-RCAN has the advantage of adaptive channel attention while increasing the network width, making the network deep enough and increasing the multiscale feature extraction capability. In terms of PSNR, the generator R2-RCAN sets a new state of the art. But these PSRN-oriented SR models share the same problem: they tend to output over-smoothed results without sufficient high-frequency details. Simply, it is poorly related to the human subjective evaluation and low perceptual quality. So R2-RCANGAN incorporates a stable U-Net discriminator and adjusted loss function, which increases the perceptual quality substantially with a small loss of reconstruction accuracy. It is only a little lower than the PI of ESRGAN, but the reconstruction accuracy is much higher than that of ESRGAN. Thus, we have designed an effective SR model. R2-RCANGAN satisfies the accuracy reconstruction of preclinical images and matches the perceptual quality of the human visual system.
Despite these advances, several outstanding questions remain to be addressed. Firstly, GAN training produces some unpleasant error textures and requires much longer training time [31,32]. The more efficient architectures should be further investigated. Optimizing the model structure can increase model training efficiency and save computational resources. Secondly, the technical route of this research is applicable to various medical imaging fields such as X-ray, CT, and MRI. We may create a personalized SR model for a specific medical scene by obtaining LR-HR datasets from different scanning devices. The limitation of this study is that all of the data are scanned from a single device, and applying images from other devices to this model may lead to bias. The accuracy, stability, robustness and extensibility of the R2-RCANGAN should be further assessed and validated. Thirdly, there is abundant room for further progress in determining more suitable measures for evaluating SR results. Several researches have proposed that PSNR cannot capture and accurately assess the image quality associated with the human visual system [31, 47,54]. A further study with more focus on more scientific evaluation metrics is therefore suggested. Finally, CT imaging assessment during rehabilitation is also critical, both for fracture rehabilitation in humans and for assessing fracture healing in animal experiments. The assessment of post-fracture rehabilitation relies on postoperative radiographs or CT [55], so that multiple radiological examinations will bring a large amount of ionizing radiation. Further work is necessary to establish the viability of R2-RCANGAN in evaluating the degree of healing during fracture recovery.

Conclusion
The purpose of the research is to perform a large-factor SR (8×) reconstruction of Micro-CT images. The difference is that previous SR researches get LR images through HR image downsampling, while our HR and LR images are real data obtained from Micro-CT. The SR model can learn the real corruption process of Micro-CT imaging. The LR-HR image pairs are made with the image processing technology of image feature point matching. We propose the new network R2-RCANGAN, which is based on Res2Net, RCAN, and U-Net. Its generator network R2-RCAN maintains the depth of the model while enhancing the model's ability for multi-scale feature extraction. In terms of PSNR, R2-RCAN sets a new state of the art compared to other methods. Adding a more stable U-net discriminator and adjusting the weights of the loss function to fit this experimental dataset. These enable R2-RCANGAN to generate reconstructed images that combine reconstruction accuracy and perceptual quality. Our R2-RCANGAN is the first attempt at large-factor pre-clinical image SR reconstruction and produces promising results. Further research should be undertaken to verify the effectiveness of SR during fracture rehabilitation, which has important clinical implications.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The animal study was reviewed and approved by The Animal Ethical and Welfare of Tianjin University.
Frontiers in Physics frontiersin.org