Intelligent Objective Osteon Segmentation Based on Deep Learning

Histology is key to understand physiology, development, growth and even reproduction of extinct animals. However, the identification and interpretation of certain structures, such as osteons, medullary bone (MB), and Lines of Arrested Growth (LAGs), are not only based on personal judgments, but also require considerable labor for subsequent analysis. Due to the dearth of available specimens, only a few quantitative histological studies have been proceeded for limited dinosaur taxa, most of which focus primarily on their growth, namely, LAGs and other growth lines without much attention to other histological structures. Here we develop a deep convolutional neural network-based method for automated osteohistological segmentation. Raw images are firstly divided into sub-images and the borders are expanded to guarantee the osteon regions integrity. ResNet-50 is employed as feature extractor and atrous spatial pyramid pooling (ASPP) is used to capture multi-scale information. A dual-resolution segmentation strategy is designed to observe the primary and secondary osteon regions from the matrix background. Finally, a segmented map with different osteon regions is obtained. This deep convolutional neural network-based model is tested on a histological dataset derived from various taxa in Alvarezsauria, a highly specialized group of non-avian theropod dinosaurs. The results show that large-scale quantitative histological analysis can be achieved by neural network-based methods, and previously hidden information by traditional methods can be revealed. Phylogenetic mapping of osteon segmentation results suggests a developmental pathway towards miniaturized body sizes in the evolution of Alvarezsauria, which may resemble the transition from non-avian dinosaurs to birds.


INTRODUCTION
Histology provides a unique chance to study the physiology, development, growth and even reproduction of extinct animals including non-avian dinosaurs. Most paleontological histological studies focus on the microstructure of fossil skeletal tissues, such as bones, cartilage, and also eggshells. The earliest histological work on dinosaur fossils can be traced back to Hylaeosaurus and Pelorosaurus by Mantell (1850), Mantell (1851), who provided the first description of microstructures in dinosaur bones, which was revisited by a thorough review on the early history of paleontological histology (Falcon-Lang and Digrius, 2014). Another review by Bailleul et al. (2019) provided a systematic summary of recent histological study advancements in techniques, discoveries, and perspectives.
Most histological studies are destructive for fossil specimens and there are only few non-avian dinosaur taxa have found available specimens for large scale comparative studies, thus quantitative histological studies are limited. Many quantitative histological studies focus on the growth and evolution of extinct organisms, including the gigantism and secondary dwarfism of sauropod dinosaurs (Sander et al., 2006;Lehman and Woodward, 2008), the growth pattern in dinosaurs and pterosaurs, particularly T. rex and Psittacosaurus (Erickson and Tumanova, 2000;Erickson et al., 2004;Padian et al., 2004;Erickson et al., 2009;Zhao et al., 2013). However, most of quantitative studies rely on manually collected statistics, for example counting the number of growth lines and calculating areas of osteons, etc. Machine learning based methods have only been recently introduced in histological studies by Haridy et al. (2021). Moreover, researchers may have different interpretations on the same structures, for example the identification of medullary bones, which are crucial to understand the sex and reproduction of extinct animals (Canoville et al., 2020). Although the identifications of most histological structures have reached tremendously congruent during the rapid advancements of fossil histology during the decade (Erickson 2014;Bailleul et al., 2019), the lack of baselines for evaluation, personal bias, and considerable cost of time in data processing are challenging future development of histological studies.
With the recent development of convolutional neural network (CNN) in image processing, CNN based feature learning methods have shown their great potential in biomedical image segmentation. Ronneberger et al. (2015) presented U-Net, which was named after its symmetrical encoding and decoding branches in a U-shape framework, for segmentation of neuronal structures in electron microscopic stacks. The network was trained in an end-to-end manner that directly provides segmentation maps based on the raw images, skip connections in its U-shape framework can improve the performance of the expansive path, thus the U-Net has become a popular architecture for biomedical image segmentation tasks.
Dilated convolution can obtain larger receptive field than traditional convolution layers. Gu et al. (2019) integrated dense atrous convolution (DAC) and residual multi-kernel pooling (RMP) blocks with encoder-decoder structure for medical image segmentation. DAC with different rates and RMP are employed to extract high-level features and preserve spatial information. In order to exploit the useful information from features in different levels, Wang et al. (2019) developed an attention module to integrate the outputs of multiple levels. Then atrous spatial pyramid pooling (ASPP) with dilated convolution was employed to capture multi-scale information.
The features learned by the standard convolution layers are not distinctive when the different target categories are similar and dense connection is a better choice. Gibson et al. (2018) adopted dense feature stacks in the encoder blocks for better feature extraction. Zhou Z. et al. (2020) presented UNet++, which consists of U-Nets of varying depths and the encoders are densely connected.  presented Dense-Res-Inception Net (DRINet) for organ segmentation of CT images. Dense connection blocks are used in the encoder blocks and residual inception block are used in the decoder blocks. Skip connections are not used for computation efficiency.
Besides, additional priori knowledge can further improve the performance. Oktay et al. (2018) designed a shape regularization loss to facilitate image segmentation. The shape information is encoded to a vector by a trained encoder. As for low-contrast images, Zhou S. et al. (2020) designed a High-Resolution Encoder-Decoder Networks for blurry medical image segmentation, which includes three main pathways: skip connection, distilling pathway and high-resolution pathway. Contour information has been integrated with segmentation task to improve accuracy.
The manual analysis of large-size fossil images is tedious and unfriendly for even paleontology experts, which also might involve subjective errors. Motivated by pursuing automatic objective analysis of osteon distribution in fossil histological images, we utilized the deep learning-based segmentation method to recognize the osteon regions of interest: vascular canal (VC) and circular lamellar bone (CLB). Since there are primary and secondary osteons distributed in bone thin sections, here a dual-resolution segmentation strategy is designed to observe the primary and secondary osteon regions from the background. The model performance was tested on a dataset comprising bone thin sections derived from Alvarezsauria (Dinosauria: Theropoda).

MATERIALS AND METHODS
The segmentation is realized by a deep convolutional neural networks (CNN) model Φ. The model's input is a fossil image I, whose height, width and channel number are H, W, and 3, respectively. The model's output is a segmentation map Σ, whose spatial size is the same with that of I. Σ has 3 channels that correspond to the background, VC region, and CLB region, respectively. Thus, the dimensions of Σ is H × W × 3. Σ i, j, k ∈ [0, 1] is a pixel value indexed by i th row and j th column, and indicates the probability that this pixel belongs to the k th category of background and osteon regions. By comparing the three channel indexes of a pixel, the channel index with the highest probability is the inferred category. In this way, Σ can be easily converted to a category index map X, as is visualized in Figure 1. Afterwards, the osteon regions are extracted from X using image structural analysis techniques. Each region is described by a triplet descriptor < c, a, R > , where c and a are the category index and region area, respectively, and R is a set of pixels located within this region. Finally, the region descriptors < c i , a i , R i > (i 1, 2, . . . . . . , N r ) are collected for further analysis.
The fossil osteon images have three characteristics. First, the osteon sizes are not in the consistent level across different dinosaur taxa and bones. Larger dinosaur bones can accommodate larger osteon sizes, so that some osteons captured onto the images are larger. Second, the fossil histological images have the resolution of 2,576 × 1936, February 2022 | Volume 10 | Article 783481 compared to the ordinary image's resolution such as 512 × 512. If we use the entire image as the segmentation model's input, the computation complexity will be high and the mini-batch based model training will be more difficult to conduct. Third, the primary osteons are usually significantly smaller than the secondary osteons and are often incomplete due to distortion from secondary osteons, which leads to a tradeoff between the small primary osteons' clearness and the large second osteons' shape-completeness if using a single resolution to view the image.
Considering the three problems, we propose the following three methods, respectively.

1) Osteon Size Sampling:
For the fossil images of a certain dinosaur taxon, we manually and randomly sampled N SP CLB regions of primary osteons and N SS CLB regions of secondary osteon. For each osteon region, its minimum bounding box is obtained and the diagonal length of the bounding box is used to represent the osteon size. The average CLB sizes of primary osteons and secondary osteons are calculated as l P and l S , respectively, which are utilized to determine the sub-image sizes in the following steps, so that the observing resolution suits the osteon size level of the given dinosaur taxon.
2) Split and Mosaic strategy: Instead of inputting the entire large image into the segmentation model Φ, we firstly split the entire image into M sub-images, then use Φ to process these sub-images, and finally mosaic the segmentation results of these sub-images together to a single large segmentation map. Therefore, the computation complexity is lower for the mini-batch based gradient backpropagation in training stage. Besides, the split of entire image into sub-images can significantly reduce the correlation and dependency of nearby osteon regions in the same image, during the learning stage, so as to enhance the generalization ability of model inference.
Given an entire image with the size of H × W, we split it into multiple sub-images with the size of h × w, as is depicted in Figure 1. The aspect ratio of the sub-image is the same with that of the entire image, namely, h/w H/W. For example, if h H/5 and w W/5 thus there is no overlapping, there will be 25 sub-images split from the original image. Generally, assuming H is not the integral multiples of h, we can pad the entire image with black borders so that the integral multiple relationship between image and sub-image size is satisfied. After the sub-images are collected, they are resized to 480 × 360 and then input to the segmentation model Φ. The output sub-segmentation maps are resized back to h × w and then mosaicked together according to the split order. Thus, the entire segmentation map is obtained ( Figure 1).
Considering that an osteon region might locate at the border of sub-images, the split operation will cause the osteon region separated into the two or more different sub-images. A simple strategy to relieve this problem is to pad the sub-images in the split operation and center-crop the sub-segmentation maps before the mosaic operation. As illustrated in Figure 1, when a sub-image (magenta rectangle with the size of h × w) is split from the raw image, we can pad it with its neighborhood (cyan rectangle with the size of 1.25h × 1.25w), so that the two primary osteon regions located on the upper border can be viewed completely. After the padded sub-image is processed by the segmentation model, the sub-segmentation map is centercropped back to the size of h × w before used to mosaic the entire segmentation map. If each sub-image is processed in this pipeline, the problem of separating osteon regions can be relieved.

3) Dual-Resolution Segmentation strategy:
As aforementioned, the primary osteon region could be significantly smaller than the secondary osteon region. Although the deep CNN based segmentation model has the multi-scale perception ability, we cannot split the raw image with a single sub-image size meanwhile guaranteeing that both the primary and secondary region are clearly and completely observed. Therefore, the dual-resolution segmentation strategy is designed, which use the higher and lower resolution to observe the primary and the secondary osteon regions, respectively.
Apparently, the observation resolution is determined by the sub-image size h × w in the split operation. With the average CLB sizes of primary and secondary osteon, labeled as l P and l S , we can build a reltionship between the sub-image sizes and the average CLB sizes, so that the observation resolution is adaptive to the osteon size. With the dualresolution strategy, we have Resolution 1: h P h P0 · l P , w P w P0 · l P Resolution 2: h S h S0 · l S , w S w S0 · l S , where h P × w P is the sub-image size for primary osteon, h S × w S is the sub-image size for secondary osteon, h P0 , w P0 , h S0 , and w S0 are the constants describing the linear rela-tionship, which satisfy h P0 /w P0 h S0 /w S0 H/W. As is shown in Figure 1, the same raw images are processed by two pipelines with two different resolutions. The upper branch is for the primary osteon, with the higher resolution and the smaller sub-image size. The lower branch is for the secondary osteon, with the lower resolution and the larger sub-image size. We train two segmentation models Φ 1 and Φ 2 , for the upper and lower branches, respectively. Φ 1 and Φ 2 share the same model structure but are trained on the two different datasets Δ 1 and Δ 2 . For example, if there are N 0 raw images for training and we use the split configurations in Figure 1, there will be 25 × N 0 sub-images in Δ 1 and 4 × N 0 sub-images in Δ 2 for the training of Φ 1 and Φ 2 , respectively.

Segmentation Model Structure
The proposed segmentation model is derived from DeepLabv3+ (Chen L. C.et al., 2018), whose structure is shown in Figure 2. DeepLabv3+ model is mainly characterized by its atrous spatial pyramid pooling (ASPP) block and brief structure with only one skip connection route. We customized the original DeepLabv3+ model as described below. 1) ResNet-50 (He et al., 2016) is used as the backbone feature extractor, outputting a feature map with 2048 channels and 16 output-stride. Output-stride is the ratio of input image resolution to map resolution. 2) ASPP block is configured with four branches: an image pooling branch, a 1 × 1 convolution branch and three 3 × 3 convolution branches with the atrous rates of {2, 4, 8}. Each branch's output map has 256 channels. The four branches' outputs are concatenated in channels and then compressed as a 256-channel map by a 1 × 1 convolution layer.
3) The skip connection pulls out the low-level feature map, i.e. the output of the 8th layer in ResNet-50. The low-level feature map with 64 channels is adapted to 32-channel by 1 × 1 convolution. The 256-channel high-level feature map is resized to 4× larger with bilinear-interpolation, and concatenated with the low-level features. Thus the final encoder output is a 288-channel multi-scale feature map. 4) In the decoder, the three convolution layers have 3 × 3 kernel and 1 × 1 stride. The first two layers have the 128-channel outputs, and the last layer has the 4-channel output with the 4 output-stride. The first two channels are for segmentation and the last two for contour prediction. Finally, the 4× upsampling with bilinear interpolation is used to resize the maps to the input size. Batch normalization is applied after convolutional layers to stabilize the training, and Rectified Linear Unit (ReLU) are used for activation. For the training of deep CNN segmentation model, we use the standard crossentropy loss.

Bone Thin Sections
For histological thin sections, we sampled 6 taxa of Alvarezsauria dinosaurs from different localities and ages (Table 1.). Long bones including fibula, tibia, femur, and metatarsal are sampled to acquire the growth information. The preparation of bone thin sections can be found in detail in Qin et al. (2021). The primary and secondary osteons are labelled by authors as groundtruth. In total 146 thin sections images with resolution of 2,576 × 1936 are obtained. Only thin section images under normal light are sampled.

Training Details
Considering the large pixel size of each thin section image, we first sub-divide each 2,576 × 1936 image into four 1,288 × 968 subimages (Data Availability: folder Sub-images). Due to their poorly defined boundaries of osteons and large portion of non-tissue areas, Haplocheirus images are excluded from the final dataset. But we still incorporate the raw images in the supplementary materials. Among the rest 146 raw thin section images, 107 images are selected as training dataset and 39 as testing dataset, users can divide the training and testing dataset as needed. Both the training and testing datasets comprise specimens from five Alvarezsaurian dinosaur taxa ranging from large to small body sizes to avoid additional bias in  sampling. The primary and secondary osteons in both the training dataset and testing dataset are labelled by authors as the groundtruth, in which every labeled image was cross validated by different authors to reduce the subjective influence to the minimum. As two segmentation models Φ 1 and Φ 2 are used for the primary and secondary osteon segmentation, the two models are training separately but under similar training parameters, except the batch sizes due to the significant size differences between primary and secondary osteons. We trained both models for 50 epochs and assigned batch sizes of 16 and 1 for primary and secondary osteons, respectively. Input image sizes are cropped into 256 × 256 in the center of the raw images (Figure 1).

RESULTS
The segmentation results are sorted in the order of intersection over union (IOU): , thus better segmented images tend to have an IOU score close to 1. After excluding outlier images with too few or no osteon, the segmentation results reached a best IOU score of 0.697 and 0.772 for primary and secondary osteons, respectively. The predicted segmentations and their corresponding original images and groundtruth are illustrated in Figure 3. Several outliers have reached better IOU scores but there are too few osteons in the vision field. The median IOU score for primary and secondary osteons are 0.476 and 0.409. We also calculate the dice, namely the Sørensen-Dice coefficient, to measure the performance of our models in segmenting both the vascular canal (VC) and the entire osteons in different taxa.

DISCUSSION
Alvarezsauria is a highly specialized group of non-avian theropod dinosaurs with worldwide distribution (Bonaparte 1991;Altangerel et al., 1993;Hutchinson and Chiappe, 1998;Naish and Dyke, 2004;Choiniere et al., 2010;Qin et al., 2019;Qin et al., 2021). They have a lot of unique features including the extremely reduced forelimbs and digits, and also resemble birds in many aspects including cranial kinesis, keeled sternums, and backward pointed pubis. During their evolution, Alvarezsaurian dinosaurs suffered sustained miniaturization, which might be due to the transition in diet (Xu et al., 2018;Qin et al., 2021).
Osteon is a fundamental structure in cortical bone. With a roughly cycle shape from transverse bone thin sections, its size and density are informative to both the growth and evolution of organisms. In our segmentation maps, the basalmost taxon Shishugounykus has an average size of individual osteon significantly higher than all other taxa and the density of its primary osteon is correspondingly lower. Two larger Early Cretaceous taxa, Bannykus and Xiyunukus have higher primary osteon density and smaller primary osteon areas than the smaller Late Cretaceous taxa Xixianykus and Qiupanykus (Figures 3, 4C,D). The decrease of primary osteon density and increase in sizes probably indicate more rapid growth due to the truncated period of growing that lead to the substantial miniaturization during the evolution of Alvarezsaurians.
We also noticed that almost all predicted areas of the primary osteons are significantly lower than the groundtruth in both training and testing datasets ( Figure 4D), while the predicted areas of the centered vascular canal are close to the groundtruth. Such contrast indicates that the localization of osteons, with the high contrast between dark-colored vascular canal and light-colored surrounding regions, is a relatively easy task for the neural network, but the subsequent expansion to segment the entire osteon unit is more challenging due to vaguely defined border in many images. Moreover, although we applied image-split, center-cropping, and padding to reduce the influences from incomplete osteons, bone remodeling during growth lead to the destruction of many primary osteons and bring challenges in correctly segmenting the entire unit. Another major problem leading to the poor performance in part of the dataset may be small sample size, particularly primary osteons in Shishugounykus ( Table 2) that shows outlier 0 in both precision and recall evaluation, indicating a zero value of true positives in segmentation results. There are only 7 and 2 primary osteons manually labeled in the training and testing datasets, respectively. Thin sections from other taxa comprise tens to hundreds of osteons thus avoid such extreme evaluation. On the other hand, generally wellpreserved fossils result in better performance, for example Xiyunykus and Banykus specimens, while more poorly preserved fossils with numerous fractures and non-tissue areas can only be deficiently segmented like the Qiupanykus specimen. By sampling better preserved fossils and use sub-divided images that bypass those noises may lead to better segmentation performance, but whether such sampling strategy introduces other kinds of bias remains to be tested. Due to the limit of available specimens, this study trained models based on a dataset covering neither the complete phylogeny nor the ontogeny of known Alvarezsaurian dinosaurs, which shall significantly influence the performance of segmentation prediction. As most well performed CNNs are trained on enormous datasets, which may comprise millions or even more images, adding new data to train our CNN is highly likely to improve its performance and may strengthen its generalization to more broadly sampled dinosaur specimens. Based on the increasing of paleontological images, data-driven studies using CNNs would soon facilitate paleontologists studying evolutionary questions.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and

AUTHOR CONTRIBUTIONS
ZQ and CY designed the project. ZQ provide the raw bone thin section images. FQ and YL designed the neural network and performed the training. Everyone put efforts in labeling the datasets groundtruth, analysing the results and preparing the manuscript.