A Dataset of Pulmonary Lesions With Multiple-Level Attributes and Fine Contours

Lung cancer is a life-threatening disease and its diagnosis is of great significance. Data scarcity and unavailability of datasets is a major bottleneck in lung cancer research. In this paper, we introduce a dataset of pulmonary lesions for designing the computer-aided diagnosis (CAD) systems. The dataset has fine contour annotations and nine attribute annotations. We define the structure of the dataset in detail, and then discuss the relationship of the attributes and pathology, and the correlation between the nine attributes with the chi-square test. To demonstrate the contribution of our dataset to computer-aided system design, we define four tasks that can be developed using our dataset. Then, we use our dataset to model multi-attribute classification tasks. We discuss the performance in 2D, 2.5D, and 3D input modes of the classification model. To improve performance, we introduce two attention mechanisms and verify the principles of the attention mechanisms through visualization. Experimental results show the relationship between different models and different levels of attributes.


INTRODUCTION
Lung cancer is caused by tumors which leads to the fastest increase in morbidity and mortality. It has a significant negative impact on the health of subjects. Therefore, the early diagnosis of lung lesions is of great significance for the treatment of lung cancer.
The early form of lung cancer is categorized as pulmonary nodules, which are clinically examined using computed tomography (CT). The characteristics of pulmonary nodules in CT images are diverse, which results in a large workload for radiologists to diagnosis the disease and leads to the subjective assessment of features. Therefore, accurate and quantitative analysis of the appearance characteristics of lung nodules is very essential for doctors to determine whether the nodules will grow into malignant tumors.
There are many publicly available datasets of pulmonary nodules. However, there are some shortcomings in the existing datasets, and the diversity of lesions cannot be balanced in these datasets. For example, LIDC/IDRI (18) has rich attributes, however, it only marks nodules, and the prediction of other pulmonary diseases cannot be performed.
In this paper, we propose a dataset of lung lesions that could help the development of a pulmonary computer-aided diagnosis system. Our dataset is multi-centered, data-diversified, and informative. The proposed dataset is rich in lesion types and covers most of the signs of lung lesions. The lesions of the dataset are labeled with contours and attribute annotations by experienced radiologists using a professional tool. The attribute annotations are composed of nine attributes that are most useful for pathological assessment. In order to make the selected attributes hierarchical, we have selected multi-level attributes: • Low-level attributes: Margin, spiculation, etc, which can be judged basically by the local features of the lesion; • Middle-level attributes: Pleural indentation, vessel convergence, etc, which need to be judged by the relationship with the surrounding tissue around the lesion or cavity and calcification, which need to be judged by the relationship between local features and global features of the lesion; • High-level attributes: The type and the location of the lesion, which requires to be judged by the abstract features of the entire lesion.
In order to describe the proposed dataset clearly, we first count the characteristics of our dataset, define the data storage format and data annotation rules for our dataset. We then propose the contours annotation format. We also focus on the correlation between the attributes of the lesions. In order to study the relationship between multiple attributes, we calculated the probability of a total of 27 categories of 9 different attributes using the chi-square test and conditional probability, and infer the correlation with the attributes by probability. In order to illustrate the practical significance of our dataset, we discuss several applications that could be studied using our dataset, and then select the attribute classification for further study. First, we model the attribute classification and then explored the performance of the 2D, 2.5D, and 3D input modes on the accuracy of the model. Through experiments, we demonstrate that there is implicit competition between multiple attributes, we, therefore, use two attention mechanisms to filter different feature activations for different attributes. Our experiments show that the attention mechanisms have different effects on attribute classification.

RELATED WORK
In this section, we briefly discuss the existing datasets of lung nodules and the relevant classification methods.

LUNA16 Dataset
The LUNA16 (4) dataset was designed for the Open Pulmonary Nod Challenge, which screened 888 CT volumes from a large dataset LIDC/IDRI as challenge data. Their slice thickness is within 2.5 mm and the nodule size is greater than 3 mm, which was annotated by more than 3 experimental doctors using tow-phase annotation. The detection annotations of a nodule in LUNA16 use the center coordinates and diameter of the inscribed circle of the nodule. In contrast, we use the gravity center coordinates as the center coordinates of the nodule and the longer geometric moment as the diameter to generate the world coordinates. For small round nodules, the two datasets are not much different, but the need is to detect large lesions with irregular shapes and our proposed approach achieves better results for large lesion detection.

LIDC/IDRI Dataset
The LIDC/IDRI (18) dataset labels each nodule with a contour and nine attributes. Besides the benign and malignant nodules, the other eight attributes are all the appearance attributes of the nodules. In contrast, in our dataset, two of the attributes are the basic attributes of the lesion, five are appearance attributes, and two have relationships with the tissue surrounding the lesion in context. These attributes are richer and can better represent a lesion.

LISS Database
The LISS (19) database has 271 CT volumes, including 677 abnormal regions. These abnormal regions are divided into nine categories, which are called common CT imaging signs of lung disease (CISLs). In other words, there is only one CISLs label for each abnormal region. Although it can better help medical scholars learn a certain type of disease (12), it is not very good for CAD system development, because it cannot capture the relationship between disease signs.

ILD Database
The ILD (20) database has 108 image series with more than 1946 ROIs. This dataset is a multimedia collection of cases of interstitial lung disease (ILDs). These ROIs are divided into 13 categories, which are lung tissue patterns from histological diagnoses of ILDs. The lesions in the ILD dataset are large, and the annotations are all high-level attributes. The dataset does not focus on a certain nodule, but on the pathology presented by a piece of tissue.

Lung Nodule Classification
The classification of lung nodules based on deep learning can be divided into two types of methods: one is to judge the benign and malignant lung nodules. Some methods directly predict the benign and malignant nodules by CT images, and other methods use different attributes of the nodules as the auxiliary basis to judge the benign and malignant nodules, such as (21)(22)(23). The other type of method has classified the disease, such as DeepLung (2) or LISCs classification (12). Dey et al. (21) have built a network that produces multiple outputs from multiscale features to judge the benign and malignant nodules. Nibali et al. (22) has made a three-column configuration to fuse the features generated from three axes. Song et al. (14,23) proposed methods that split the whole image into patches and predict the lesions. In contrast, Gao et al. (13) have used the whole image for classification. With the development of computationally efficient computers, the 3D models such as (24) has achieved an impressive performance in nodule classification. He (12) proposed a method to generate images for data augmentation, FIGURE 1 | Lesions in our dataset. Except for some small nodules, which are marked with a circle, such as the second image in the first row, other lesions are marked by a very close contour. The six images in the first row are different types of lesions, and in the second to fourth rows, each set of three images are spiculation, lobulation, calcification, cavity, vessel convergence, and pleural indentation.
which achieved a good improvement in performance. Zhu et al. (2) detected the position of the nodules first, then cropped the sent the nodules before feeding it into a classification model to predict one of nine attributes.
Multi-attribute classification is a problem to classify multiple targets using one model. There are currently two approaches to solve this problem. The first is to regard it as a classification task with a fixed number of categories, and solve attribute correlation in one model by using multiple branches to decompose the relationship between multiple targets onto each branch. The second is to treat it as a multi-label classification task, with the positive attribute as the label of the lesion, then each lesion has a floating number of labels, and the labels are decoupled using different methods. In this paper, we use the first method to classify different attributes in a model using a fixed number of branches, and use two attention mechanisms to help decouple the correlation among the attributes.

LUNG LESION DATASET
In this section, we provide a description of our dataset. CT data were collected from four hospitals. The body parts examined are mainly the chest and abdomen. Among them, the chest CT was mostly thin (less than 3 mm), and the abdomen CT was mostly thick (greater than or equal to 5 mm). Figure 1 shows examples of lesions in our dataset. As shown in Figure 1, except for some small nodules, which are marked with circles, such as the second image in the first row, other lesions are marked by a very close contour. Table 1 shows the parameter comparison of our dataset with several other public datasets. Same with LUNA16, our dataset annotates lesion with contour, which is shown in Figure 1. Compared with box and polygon, contour annotation has more generalization ability to different tasks, such as location, detection, and segmentation. At the same time, though the number of scans in our dataset is not the largest, the number of lesion annotations and the range of lesion size in our dataset are. These annotations support more robust models. Moreover, the thickness of the slices of our datasets is relatively uniform, especially compared to LUNA 16. It reduces unnecessary processing of the data and makes it easier to use.

File Storage and Annotation Format
The raw data obtained from the hospital contains some sensitive information of subjects, and the data collected from different hospitals are stored in different ways, making the data difficult to use directly for analysis. Therefore, we first desensitize the data by removing subjects' sensitive information and retain only the necessary information, such as weight. Then, we store the CT volumes and annotation files as described below. We define the directory structure to store files as follows: c t _ t y p e / h o s p i t a l / y e a r / month / day / s u b j e c t _ i d / s e r i e s _ i d .
The directory with series_id SE01 stores the CT data with DICOM format, and the directory with series_id SE01_01_0n stores the contour annotation file aid_loc . anno, where n is the identification number of the doctor who annotated the scans; aid is the number of the annotation in the CT for correspondence with the attribute information; loc is the slice number in the CT volume, and the description in the DICOM file is SliceLocation (0020, 1041). An anno file represents an annotation. Each anno file has a different aid, but two anno files can have the same loc, indicating that the two annotations are in the same slice. It uses a dictionary to store the annotation information we need to use in the CAD tasks. The keywords of the anno format are SeriesID, NoduleSerialNumber, InstanceNumber, Origin, Dimension, Spacing, Coords, XMin, XMax, YMin, YMax. Among them, SeriesID is a unique number of a DICOM volume which described as SeriesInstanceUID (0020, 000E), NoduleSerialNumber and InstanceNumber are aid and loc, respectively as mentioned above, Origin, Dimension, Spacing are the information from DICOM volume, Coords is the contour coordinate of this annotation, and its value is relative to the size of this slice. (XMin, YMin), (XMax, YMax) are the coordinates of the lower left and upper right corners of the bounding box of this annotation. The CT volumes in our dataset contain lesions, while those without lesions have been removed by manually screening of RIS reports. For repeated subject numbers, such as two volumes of one subject, we map one of them to a new subject number and retain the correspondence to restore the original number.

Two-Phase Annotation Process
We use a two-phase annotation process to label the lesions. We label the contours of the lesions in the first phase, then label the attributes of the lesions in the second phase.

Contour Annotation Criterion
The contours are marked by experienced radiologists. In order to save the doctor's time and to increase the density of the lesion, we first manually screen the RIS report, retain the CT volume with the lesion in the description, and remove the volume without the lesion from the dataset. In order to standardize the process of marking the lesions, we have prescribed a rule for marking lesions with the doctor as follows: • Mark all visible lesions; • If the lesion is too small to draw the contour, circle the lesion with a circle tool; • If the lesion is larger than one slice, mark the lesion every three consecutive slices; • Draw a contour as close as possible to the edge of a lesion.
After the marking process, we perform a secondary screening to remove the annotations which are too discontinuous to be processed as contours. Then, we convert the annotations into anno format and mark lesion numbers. In this way, the contour annotations and the attribute annotations correspond with respective file names.

Attribute Annotation Criterion
After discussed with the doctor, we selected nine attributes that are commonly used in clinical diagnosis as attribute annotations for the dataset. A detailed description of these attributes will be provided in section 5.2. Each lesion is independently labeled by a doctor, and we record the doctor's number for each lesion that can be used to identify the doctor if an error is discovered in the annotation.
In order to simplify the labeling of attributes, we implement an attribute labeling tool to collect and manage labels. We associate the slice of the contour with the lesion number so that it is convenient to label the attributes with the corresponding slice. When the attribute information is marked, the corresponding subject number and label number are recorded to correspond to the contour number. It should be noted that the contour annotation and the attribute annotation are not one-to-one matched. Some problematic contour annotations are filtered out in the previous step, and no attribute annotation is performed. Finally, we only select lesions with both contour and attribute annotations into the dataset. The number of attributes is reported in Table 1. As can be noted, the categories of some attributes are very unbalanced. This brings great challenges to the performance of our attribute classification algorithm.

ATTRIBUTES AND PATHOLOGY
We initially selected 15 attributes that are commonly used in clinical diagnosis, and then selected 9 attributes for our dataset based on their importance. The number of categories of these attributes is not balanced and the distributions are not independent. Here we briefly describe the importance of these attributes in clinical diagnosis and then discuss the correlation between attributes from the statistics point of view.

Attributes Description
Among the 9 attributes we selected, besides the basic attribute, lesion type, and lesion location, there are vessel convergence and pleural indentation which represent the relationship between the lesion and the surrounding tissue. On the other hand, margin, calcification, lobulation, spiculation, cavity represent the apparent features of the lesion. The description of the significance of these nine attributes is as follows.

Lesion Type
The first row of Figure 1 shows six different lesion types. For the lesion type, we choose placeholder, nodule, ground glass opacity, air containing space, mutation, and pleural effusion. The difference between placeholders and nodules is that the lesions with a diameter of less than 30 mm are nodules, and those larger than 30 mm are placeholders. Except for the difference in size, the other attributes of the two lesion types are roughly similar. The air containing space is different from the cavity in pathology. The air containing space (Figure 1, the fifth image in the first row) is a pathological enlargement of the physiological cavity in the lung, while the cavities (Figure 1, the last three images in the third row) often appears in nodules or placeholders. In the air containing space lesions, the wall of the lesion is thinner and more uniform, mostly occurring in the subpleural area, and the size varies greatly. This means that the location of the air containing space is fixed and there are no apparent attributes such as spiculation and lobulation.

Lesion Location
The location of the nodule is represented by five categories of lobes, including the right upper lobe, the right middle lobe, the right lower lobe, the left upper lobe, and the left lower lobe. Statistics show that the occurrence of lesions has little relationship with the location. The lesion location is only a basic attribute of the lesion, and it cannot be used as a basis for judging its pathological nature. Some lesions are large and span multiple lung lobes, so we mark them as 0, and do not include it in the five categories above.

Margin
The margin attribute describes whether the outer boundary of a nodule is clear. We defined two main categories for this attribute: clear and unclear margin. Though the margin of a benign mass is often smooth, while that of a malignant mass is often unclear, inflammation may also cause an unclear margin of placeholder. Therefore, it cannot be used as the sole basis for judging benign and malignant lesion, and needs to be judged in combination with other attributes.

Calcification
The calcification attribute describes lesions whose density is significantly higher than other soft tissues in the mediastinal window, usually with CT values above 100 Hu. The first three images in the third row of Figure 1 show lesions of calcification. The white region in the images represents calcification. Calcification is a pathologically metamorphic lesion, which is more common in the healing stage of ductal tuberculosis lesions in the lung tissue or lymph nodes; calcification can also occur in tumor tissues or cyst walls. Usually, the greater the proportion of calcification in the lesion, the greater the likelihood of its being benign. Based on this, we classify the calcification attributes into three categories: no, partial, and total calcification.

Lobulation
The lobulation attribute indicates that the nodule or mass grows at different speeds in various directions or is blocked by the surrounding structure. The contours may have a plurality of arcuate protrusions, and the curved phases are concave cuts to form a lobulated shape. The last three images in the second row of Figure 1 show the lesions of lobulation. We can clearly see the convex part of the masses. We simply define two categories for this attribute: with and without lobulation.

Spiculation
The spiculation attribute is characterized by a radial, unbranched, straight, and strong thin line shadow extending from the edge of the nodule to the periphery, and the proximal end of the shadow is slightly thicker. The first three images in the second row of Figure 1 show lesions of spiculation. As shown in Figure 1, the burrs of the lesion are often not circled in the scope of annotation. The spiculation is not connected to the pleura, and distinct from the pleural depression. We classify the spiculation attributes into no, short and long spiculation; 5 mm burrs are called short spiculation, and larger than 5 mm burrs are called long spiculation. The pathological basis of the burr is the fiber band in which the tumor cells infiltrate into the adjacent bronchial sheath and local lymphatic vessels, or the tumor promotes connective tissue formation. Benign nodular inflammatory pseudotumor, tuberculoma can also be seen burrs, but longer, softer, more often formed by hyperplastic fibrous connective tissue. The possibility of lung cancer should be considered when there is a burr in solitary lung nodules.

Cavity
The cancerous cavities are mostly located in the anterior segment of the upper lobe and the basal segment of the lower lobe. Most of the cavities larger than 3 cm in diameter are tumors. Most cancerous cavities present an irregular or lobulated outer edge and irregular inner edge. Those with a wall thinner than 4 mm are mostly benign lesions, and those thicker than 15 mm are mostly malignant lesions. The last three images in the third row of Figure 1 show the lesions of a cavity. We simply defined two categories for this attribute: with and without cavity.

Vessel Convergence
The vessel convergence attribute appears on the slices as one or more vessels around the pulmonary nodule that touch with, cut or pass through the placeholder at its edge. The appearance of vessel convergence is related to the size of the placeholder or nodule. The lesions less than 1 cm in diameter have fewer vessel convergence signs. The first three images in the last row of Figure 1 shows the lesions of vessel convergence. Images of the cavities and vessel convergence are similar, because the blood vessels look like cavities when they are transacted. A multivessel-directed lesion presents vessel convergence, which leads to a higher chance of malignancy. In particular, the phenomenon that one blood vessel leads to a nodule or tumor is not only seen in malignant nodules, but also in benign lesions such as tuberculosis, inflammatory pseudotumor, or hamartoma. We simply defined two categories for this attribute: with and without vessel convergence.

Pleural Indentation
The typical pleural indentation shows a small triangular shadow or a small trumpet shadow on the visceral surface of the visceral pleura. The bottom of the triangle is on the inside of the chest wall, the tip points on the nodule, and the nodule and the triangle shadow can be connected by a linear shadow. The last three images in the last row of Figure 1 shows the lesions of pleural indentation. Peripheral lesions of the pleural indentation are often accompanied by other imaging signs. The pathological basis and imaging manifestations of pleural indentation in benign and malignant lesions are different. We simply define two categories for this attribute: with and without pleural indentation.

Correlation Between Attributes
In order to evaluate the correlation between attributes, we used the chi-square test. We assume that if the two attributes are independent of each other, their data distribution should not affect each other, which means that the proportional relationship between the categories of one attribute is the same under each category of the other attribute. If the chi-square test value calculated by the two attributes is greater than the statistical significance, there is a correlation between the two attributes. The approximate calculation equation for the chi-square test statistic is as follows: where f 0 is the actual number of observations and f e is the expected number of times. The larger the value of f e , the Equation (1) approximately obeys the chi-square distribution. To simplify the calculation of the chi-square test, we used a variant of Equation (1): where f x and f y represent the number of samples of the categories of two different attributes x and y, respectively, R and C are the number of categories of f x and f y , and the total number of attributes is N. The degree of freedom df of the independence test is calculated as follows: We use the data shown in Table 2 and select a significance level of 0.05 for calculation. Figure 2A shows the result of the chi-square test. As the results show, there is a strong correlation between the three attributes of margin, speculation, and lobulation. Meanwhile, there is a strong correlation between vessel convergence and spiculation, margin, lobulation and lesion type, pleural indentation, and margin.
To further explore the specific relationship between the various categories of attributes, we calculated the conditional probability between a total of 27 categories for all attributes. The equation for calculating the conditional probability is as follows: where P(X) and P(Y) represent the probabilities of two categories X and Y, P(X|Y) represents the probability of X to occur when Y is present, and P(XY) represents the probability of co-occurrence for X and Y. The value of P(X|X) is 1, which is represented by white color in Figure 2B. We calculated the conditional probability between each of the two categories. As shown in Figure 2B, the white color represents a probability of 1 and the black color represents a probability of 0, while the lighter gray color represents higher conditional probability values. According to the statistical results, there is a strong correlation between different lesion types and other attributes. For the placeholder, their margins are almost unclear, the degree of lobulation is more obvious, the degree of spiculation and the degree of pleural indentation are the highest among other lesion types. The nodules, ground glass, and mutation categories have a small number of spiculation and lobulation, and more features of vessel convergence and pleural indentation. For cavity and pleural effusion, they almost have no other attributes and their margins are all clear.
The margin attribute is highly correlated with lobulation, vessel convergence, and pleural indentation. When vessel convergence and pleural indentation are present, they are often accompanied by lobulation, and the margin is not very clear. The calcification attribute is concentrated in the nodules, and the cavity is also related to the margin and lobulation.

TASKS OF DATASET
Our dataset is rich in data and diverse in annotations, which means that our dataset can be used for several tasks and aid in the development of CAD systems. We recommend using our dataset for the following tasks: (1) Detection: Some of the lesions in our dataset are smaller than 30 mm, which are nearly circular and suitable for lung nodules detection. This can be helpful for the initial diagnosis of lung cancer. (2) Segmentation: The lesions larger than 30 mm are all marked with precise contours. These lesions are more complex in shape and are suitable for the lung lesion segmentation task. This can be helpful for volume measurement and further treatment. (3) Classification: Multiple attributes of the lesion are suitable for multi-task lung disease prediction. This can be helpful to judge benign and malignant tumors. (4) Reconstruction: At present, medical datasets are small, and their size is not enough for deep learning. Our dataset has various types of data, and we can use real data to train generative adversarial networks to generate synthetic data.
In this paper, we focus on exploring the correlation between attributes. We, therefore, perform multi-attribute classification and report our experimental results in section 6.

2D, 2.5D, 3D Modes for Classification
In order to study the importance of the input mode for the model, we use different data dimensions for the same data and the model for classification experiments. We use three input modes including 2D, 2.5D, and 3D. Assuming that the size of a CT volume is H × W × C, which corresponds to the three axes of X-Y-Z, the diameter of a lesion is d, the three input modes are expressed as follows:

2D Mode
The lesion is cut out from the grayscale slice in which it is located with a length d of side, and fed to a 2D network for prediction. The input size is d × d × 1. The 2D input mode can retain the lesion at the spatial structure in the X-Y direction, but the context information in the Z direction cannot be captured.

2.5D Mode
The grayscale image of the lesion and the five images above and below are cut out by the bounding box, and fed to the 2D network for prediction. The number of input channels is 5, and the input size is d × d × 5. Compared to the 2D input mode, the 2.5D  input mode is supplemented by a fixed number of slices in the Z direction.

3D Mode
In the X-Y-Z direction where the lesion is located, the bounding box (d × d × d) is cropped and fed to the 3D network for prediction. 3D network can capture the correlation on the Zaxis of the whole lesion by convolution. Compared with 2D, the information of 2.5D is more detailed, but the amount of 3D network parameters is more than that of 2D network, which can cause the deep learning model to overfit as the size of training data is small. The architecture of our basic model is shown in Figure 3. In order to extract the relationship of nine attributes, we use a ResNet-based network (25) to extract the characteristics of the nodule and then use nine classification branches to predict nine attributes independently. We will explain the details and the results in section 6.1.

Two Attention Mechanisms
Through the experiments, we found that there is an implicit competition between multiple attributes during training. In the training phase, when the loss value is stable, the accuracy of some attributes increases while the accuracy of other attributes decreases. To solve this problem, we add an attention module in front of each attribute classifier to focus the activation on the features which are useful for classification. In this way, different input features for attributes are extracted, which could mitigate the conflict between attributes. Inspired by (26)(27)(28), we employed soft-attention and self-attention, commonly used mechanisms that compute a weight matrix used to filter noise and to focus on important features. These two attention mechanisms are described below in our model, and Figure 4 shows the structure of the two attention modules.

Soft-Attention Module
As shown in Figure 4A, we add a soft-attention module (26) before feeding the features into the classifier to filter out shallower features with deeper features. While preserving the spatial structure, the attention module extracts a mask from the features to suppress noise which is not related to the attribute to improve accuracy.
Assuming that feature map x ∈ R N×C x ×H×W from the basic model is the input feature for the attention model, and feature map x g ∈ R N×C g ×H×W is from a deeper layer as the gate, we firstly use 1 × 1 convolutional layer to get the same number of channels C g for both the features, then sum the features x and x g together and add a non-linear transform ReLU which can be formulated as σ 1 (x) = max (0, x). So far, the feature x is mixed with richer semantic information x g , and we use a 1 × 1 convolutional layer to fuse the channel information and retain the spatial information, and get a mask x m with a value of [0, 1] through the sigmoid function which can be formulated as σ 2 (x) = 1 + e −x −1 . Finally, we use the mask x m to spatial-wise reweight the feature map x and get the output featurex. After filtering by the soft-attention module, the featureŝ x are re-weighted by high-dimensional semantic information in the spatial dimension, which is more conducive to multiattribute classification.

Self-Attention Module
As shown in Figure 4B, we add a self-attention module (27,28) before the features and fed to the classifier to squeeze the spatial structure of a feature map into one vector with spatial information. Then, we gather and filter the information to enhance the activation related to that attribute, and add the information to the original feature map to enhance the feature. Assuming that feature map x ∈ R N×C x ×H×W is generated from the basic model as the input feature of the attention model, we use a channel squeeze and spatial excitation branch to transform x to extract the spatial information and reweight the origin x with the transform of itself. We use a global pool which can squeeze x to a vector z ∈ R N×C x ×1×1 . Then use two fully connected layers to transform the vector z toẑ = W 1 σ 1 (W 2 · z) with W 1 ∈ R C×C/16 and W 2 ∈ R C/16×C and the activation σ 1 . We also use the non-linear function σ 2 to transform the values to [0, 1] to get the channel mask x m . Finally, we use the x m to channel-wise reweight the feature map x and get the output featurex. After filtering by the self-attention module, the featuresx are re-weighted by the information after squeeze and excitation in the spatial dimension, which is more conducive to multi-attribute classification.

EXPERIMENTAL RESULTS
In this section, we first verify that the proposed model can learn the correlation between attributes, and then empirically select the best input mode, and verify the attention mechanism on this input mode.
We used part of the data with a thickness of 1.0-2.0 mm in our experiments, which has 355 CT volumes and 2014 lesions labeled with 9 attributes in our dataset. The dataset has been split into 8:2 as the training set and validation set, with 1,847 lesions in the training set and 163 lesions in the validation set. During training, we randomly select 30% of the data for data augmentation i.e., random flip and rotation. As Table 2 shows, the number of categories in the dataset is unbalanced, which could affect the convergence of the model. We use weighted cross Frontiers in Digital Health | www.frontiersin.org entropy loss to reduce the impact of data imbalance during the training phase.
In the experiments, each model has four blocks. The first one is a convolutional block and the other three are residual blocks. At the end of the model, there are nine classifier blocks for the classification of nine attributes, respectively. We use the reweighted logistic loss to balance the numbers of categories. During the training phase, we set the learning rate to 0.01 with warm restart (29) and use SGD to optimize the model. The momentum was set to 0.09, the weight decay was set to 10 −4 and the batch size was set to 64. Since the model converges quickly, we have trained 200 epochs for each model and choose the model with the smallest validation loss as the best model.
The imbalanced data causes that no valid features can be learned, and results in low sensitivity of the model to this attribute. As shown in Tables 3, 4, categories with too few samples, such as partial calcification and with cavity, were not recognized. A given category prediction may have the following four cases: TP, True Positive; FP, False Positive; TN, True Negative; FN, False Negative.
To evaluate the imbalanced categories of each attribute, we use three metrics to score the results. Accuracy (ACC) is the basic metric to evaluate the result, which can be calculated as: Sensitivity (SE), also called the true positive rate, means the probability that a sick person is diagnosed as positive, which can be calculated as: The larger the SE value, the more sensitive our model is in diagnosing this category. Specificity (SP), also called the true negative rate, means the probability that a person who is actually not sick is diagnosed as negative, which can be calculated as: The larger the value of SP, the more accurate our model is for the diagnosis of this category. We average out accuracies of all categories for each attribute, and average the scores of all attributes as the final score to represent the performance of the model.

FIGURE 5 |
The results of the 2D, 2.5D, 3D input modes. As can be noted, the 3D mode has better results on spiculation, lobulation, cavity, vessel convergence, and pleural indentation.

FIGURE 6 |
The results of the base model and two attention models. As can be noted, the self-attention module has better results on lobulation, pleural indentation, and lesion location attributes; the soft-attention module has better results on lesion type, margin, spiculation, calcification, cavity, and vessel convergence attributes.

Results for Input Modes
In order to select the most suitable input mode for the attribute classification of lung lesions, we train the 2D, 2.5D, and 3D model with the same structure described in Figure 3. To ensure the fairness of the three models, we do not adjust the hyperparameters for different models. Each model was trained with 200 epochs and a batch size of 64. To evaluate the performance of the models, we chose the average accuracy of the model with the lowest validation loss as the metric. The average accuracy scores of the 3D, 2.5D, 2D model are 0.7513, 0.7511, and 0.7816; the average sensitivity are 0.7671, 0.7006, and 0.7184; and the average specificity are 0.8305, 0.7995, and 0.8116, respectively. As Figure 5 shows, the three models have almost the same scores in lesion type and margin, and the model with 2D mode has better scores in spiculation, lobulation, vessel convergence, and pleural indentation. Table 3 shows the accuracy, sensitivity, and specificity of each category for each attribute. From the experimental results, we note that the higherlevel attributes, such as lesion type and lesion location, are more sensitive to the 3D mode and the lower-level attributes, such as spiculation and lobulation, are more sensitive to the 2D mode.
During training, we noticed that the 3D model has more parameters than the 2D models, which led to longer training time and slower convergence. Meanwhile, the 2D model has better average accuracy than the 3D model. So, we chose the 2D mode as the basic model for the following experiments.

Results for Attention Mechanisms
In order to improve the performance of the basic model, we have used two attention mechanisms to enhance the feature before feeding it to the classifiers. We called the model with the soft-attention module Soft-Att, and the model with the selfattention module Self-Att. Since the number of parameters of the two attention modules is not large, we use the same hyperparameters as the basic model to train the two models. Similar to the previous section, we used a batch size of 64 and 200 epochs for training and taking the accuracy of the model with the lowest validation loss as the metric. The average accuracies scores of the basic model, Soft-Att and Self-Att are 0.7816, 0.8032, and 0.7763; the average sensitivities are 0.7184, 0.7183, and 0.7155; and the average specificities are 0.8116, 0.8117, and 0.8128, respectively.
As Figure 6 shows, the soft-attention module has better results on margin, vessel convergence, lesion type, and spiculation attributes, and the self-attention module has better results on lobulation, pleural indentation, and lesion location attributes. Due to the near-zero sensitivity of calcification and cavity attributes, we do not take their accuracy into comparison. As reported in Table 4, the two models with attention modules have better performance than the basic model.
The heatmaps in Figure 7 visualize the attention mechanisms. Compared with the basic model, the red value of soft-attention is concentrated at one point. This is because soft-attention uses higher-layer semantic information to filter the low-layer features, which makes the features spatially smoother and more focused. This is a good feature for high-level attributes because it is concentrated at the point that best reflects the attribute, but it does not fully reflect the local information relationship. Compared with the basic model, the red value of self-attention is more scattered in the spatial dimension. This is because selfattention extracts channel information by compressing spatial information using its own features, and it is more comprehensive in spatial information due to multi-channel fusion. This is a good feature for low-level attributes because its local information relationships are more spatially refined, but because of the noise in the spatial dimension, it may not be appropriate for highlevel attributes.

CONCLUSION
This paper presents a dataset of lung lesions with fine contour annotation and attribute and explores the correlation between the attributes of the dataset. To demonstrate the contribution of this dataset to the development of CAD systems, we explore two issues of medical data modeling using attribute classification tasks.
One of the issues is the effect of the 2D, 2.5D, 3D input mode on the classification model. The 2D mode works well for low-level attributes that do not require local information relationships between lesions and surrounding tissues, while the 3D mode works better for high-level attributes that require higher contextual relationships. The 2.5D mode is a trade-off between the lightweight of the 2D model and the context information of the 3D model.
The second is the impact of the two attention mechanisms on the model. Soft-attention can better handle the noise in the spatial dimension and concentrate on the features at one point, which is beneficial for the classification of high-level attributes. Self-attention can better integrate the spatial information in the channel dimension, and complement the local relationship in the spatial dimension, which is beneficial for the classification of low-level attributes.
In the future, we mainly want to explore and address the following three issues: 1. For the three categories of cavity, partial calcification, and long spiculation, the sensitivity is almost zero due to the high degree of the category imbalance. We will explore novel methods to improve the accuracy of these three categories. 2. We will use the correlation between attributes to establish a loss function suitable for multi-attribute classification from the statistical learning strategy. 3. There is not a single metric that can well measure the performance of a multi-attribute model. We will build evaluation metrics for multi-task modeling.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.