A Multiparametric Method Based on Clinical and CT-Based Radiomics to Predict the Expression of p53 and VEGF in Patients With Spinal Giant Cell Tumor of Bone

Purpose This project aimed to assess the significance of vascular endothelial growth factor (VEGF) and p53 for predicting progression-free survival (PFS) in patients with spinal giant cell tumor of bone (GCTB) and to construct models for predicting these two biomarkers based on clinical and computer tomography (CT) radiomics to identify high-risk patients for improving treatment. Material and Methods A retrospective study was performed from April 2009 to January 2019. A total of 80 patients with spinal GCTB who underwent surgery in our institution were identified. VEGF and p53 expression and clinical and general imaging information were collected. Multivariate Cox regression models were used to verify the prognostic factors. The radiomics features were extracted from the regions of interest (ROIs) in preoperative CT, and then important features were selected by the SVM to build classification models, evaluated by 10-fold crossvalidation. The clinical variables were processed using the same method to build a conventional model for comparison. Results The immunohistochemistry of 80 patients was obtained: 49 with high-VEGF and 31 with low-VEGF, 68 with wild-type p53, and 12 with mutant p53. p53 and VEGF were independent prognostic factors affecting PFS found in multivariate Cox regression analysis. For VEGF, the Spinal Instability Neoplastic Score (SINS) was greater in the high than low groups, p < 0.001. For p53, SINS (p = 0.030) and Enneking stage (p = 0.017) were higher in mutant than wild-type groups. The VEGF radiomics model built using 3 features achieved an area under the curve (AUC) of 0.88, and the p53 radiomics model built using 4 features had an AUC of 0.79. The conventional model built using SINS, and the Enneking stage had a slightly lower AUC of 0.81 for VEGF and 0.72 for p53. Conclusion p53 and VEGF are associated with prognosis in patients with spinal GCTB, and the radiomics analysis based on preoperative CT provides a feasible method for the evaluation of these two biomarkers, which may aid in choosing better management strategies.


First Order Features, N=18
First-order statistics describe the distribution of voxel intensities within the image region defined by the mask through commonly used and basic metrics. Let: • be a set of voxels included in the ROI • ( ) be the first order histogram with discrete intensity levels, where is the number of non-zero bins. • ( ) be the normalized first order histogram and equal to ( )

Energy
Here, is optional value, which shifts the intensities to prevent negative values in . This ensures that voxels with the lowest gray values contribute the least to Energy, instead of voxels with gray level intensity closest to 0. Energy is a measure of the magnitude of voxel values in an image. A larger value implies a greater sum of the squares of these values. Robust Mean Absolute Deviation is the mean distance of all intensity values from the Mean Value calculated on the subset of image array with gray levels in between, or equal to the 10 th and 90 th percentile.

Root Mean Squared (RMS)
Here, c is optional value, which shifts the intensities to prevent negative values in . This ensures that voxels with the lowest gray values contribute the least to RMS, instead of voxels with gray level intensity closest to 0. RMS is the square-root of the mean of all the squared intensity values. It is another measure of the magnitude of the image values. This feature is volume-confounded, a larger value of c increases the effect of volume-confounding.

Standard Deviation
standard Standard Deviation measures the amount of variation or dispersion from the Mean Value.

Skewness
Where 3/μ3 is the 3 rd central moment. Skewness measures the asymmetry of the distribution of values about the Mean value. Depending on where the tail is elongated and the mass of the distribution is concentrated, this value can be positive or negative.

Kurtosis
Where 4/μ4 is the 4 th central moment.
Kurtosis is a measure of the 'peakedness' of the distribution of values in the image ROI. A higher kurtosis implies that the mass of the distribution is concentrated towards the tail(s) rather than towards the mean. A lower kurtosis implies the reverse: that the mass of the distribution is concentrated towards a spike near the Mean value.

Variance
Variance is the mean of the squared distances of each intensity value from the Mean value. This is a measure of the spread of the distribution about the mean. By definition, variance= 2

Uniformity
Uniformity is a measure of the sum of the squares of each intensity value. This is a measure of the homogeneity of the image array, where a greater uniformity implies a greater homogeneity or a smaller range of discrete intensity values.

Shape Features (3D), N=14
In this group of features we included descriptors of the three-dimensional size and shape of the ROI. These features are independent from the gray level intensity distribution in the ROI and are therefore only calculated on the non-derived image and mask. Unless otherwise specified, features are derived from the approximated shape defined by the triangle mesh. To build this mesh, vertices (points) are first defined as points halfway on an edge between a voxel included in the ROI and one outside the ROI. By connecting these vertices a mesh of connected triangles is obtained, with each triangle defined by 3 adjacent vertices, which shares each side with exactly one other triangle. This mesh is generated using a marching cubes algorithm. In this algorithm, a 2x2 cube is moved through the mask space. For each position, the corners of the cube are then marked 'segmented' (1) or 'not segmented' (0). Treating the corners as specific bits in a binary number, a unique cube-index is obtained (0-255). This index is then used to determine which triangles are present in the cube, which are defined in a lookup table. These triangles are defined in such a way, that the normal (obtained from the cross product of vectors describing 2 out of 3 edges) are always oriented in the same direction. Let: • represent the number of voxels included in the ROI • represent the number of faces (triangles) defining the Mesh. (1) The volume of the ROI is calculated from the triangle mesh of the ROI. For each face in the mesh, defined by points , and , the (signed) volume of the tetrahedron defined by that face and the origin of the image ( ) is calculated. The sign of the volume is determined by the sign of the normal, which must be consistently defined as either facing outward or inward of the ROI.

Voxel Volume
The volume of the ROI is approximated by multiplying the number of voxels in the ROI by the volume of a single voxel . This is a less precise approximation of the volume and is not used in subsequent features. This feature does not make use of the mesh and is not used in calculation of other shape features.

Surface Area
where: a b and a c are edges of the th triangle in the mesh, formed by vertices a ai, b bi and c ci.

Surface Area to Volume ratio
surface to volume ratio= / Here, a lower value indicates a more compact (sphere-like) shape. This feature is not dimensionless, and is therefore (partly) dependent on the volume of the ROI.

Sphericity
Sphericity is a measure of the roundness of the shape of the tumor region relative to a sphere. It is a dimensionless measure, independent of scale and orientation. The value range is 0< ℎ ≤10<sphericity≤1, where a value of 1 indicates a perfect sphere (a sphere has the smallest possible surface area for a given volume, compared to other solids).

Maximum 3D diameter
Maximum 3D diameter is defined as the largest pairwise Euclidean distance between tumor surface mesh vertices.

Maximum 2D diameter (Slice)
Maximum 2D diameter (Slice) is defined as the largest pairwise Euclidean distance between tumor surface mesh vertices in the row-column (generally the axial) plane.

Maximum 2D diameter (Column)
Maximum 2D diameter (Column) is defined as the largest pairwise Euclidean distance between tumor surface mesh vertices in the row-slice (usually the coronal) plane.

Maximum 2D diameter (Row)
Maximum 2D diameter (Row) is defined as the largest pairwise Euclidean distance between tumor surface mesh vertices in the column-slice (usually the sagittal) plane.

major axis = 4√
This feature yield the largest axis length of the ROI-enclosing ellipsoid and is calculated using the largest principal component . The principal component analysis is performed using the physical coordinates of the voxel centers defining the ROI. It therefore takes spacing into account, but does not make use of the shape mesh.

Minor Axis Length
minor axis = 4√ This feature yield the second-largest axis length of the ROI-enclosing ellipsoid and is calculated using the largest principal component . The principal component analysis is performed using the physical coordinates of the voxel centers defining the ROI. It therefore takes spacing into account, but does not make use of the shape mesh.

Least Axis Length
least axis = 4√ This feature yield the smallest axis length of the ROI-enclosing ellipsoid and is calculated using the largest principal component In case of a 2D segmentation, this value will be 0.

Elongation
Elongation shows the relationship between the two largest principal components in the ROI shape. For computational reasons, this feature is defined as the inverse of true elongation.

elongation = √
Here, major and minor are the lengths of the largest and second largest principal component axes. The values range between 1 (where the cross section through the first and second largest principal moments is circle-like (non-elongated)) and 0 (where the object is a maximally elongated: i.e. a 1 dimensional line).

Flatness
Flatness shows the relationship between the largest and smallest principal components in the ROI shape. For computational reasons, this feature is defined as the inverse of true flatness.

flatness = √
Here, major and least are the lengths of the largest and smallest principal component axes. The values range between 1 (non-flat, sphere-like) and 0 (a flat object, or single-slice segmentation).

Gray Level Co-occurrence Matrix (GLCM) Features, N=24
A Gray Level Co-occurrence Matrix (GLCM) of size × describes the second-order joint probability function of an image region constrained by the mask and is defined as ( , | , ). The ( , )th element of this matrix represents the number of times the combination of levels and occur in two pixels in the image, that are separated by a distance of pixels along angle . The distance from the center voxel is defined as the distance according to the infinity norm. For =1, this results in 2 neighbors for each of 13 angles in 3D (26-connectivity) and for =2 a 98-connectivity (49 unique angles). As a two dimensional example, let the following matrix represent a 5x5 image, having 5 discrete grey levels: 1 2 5 2 3 3 2 1 3 1 1 3 5 5 2 1 1 1 1 2 1 2 4 3 5 ] For distance =1(considering pixels with a distance of 1 pixel from each other) and angle =0(horizontal plane, i.e. voxels to the left and right of the center voxel), the following symmetrical GLCM is obtained: By default, the value of a feature is calculated on the GLCM for each angle separately, after which the mean of these values is returned. If distance weighting is enabled, GLCM matrices are weighted by weighting factor W and then summed and normalized. Features are then calculated on the resultant matrix. Weighting factor W is calculated for the distance between neighboring voxels by: = −‖ ‖2, where d is the distance for the associated angle according to the norm specified in setting 'weighting Norm'.

Autocorrelation
Autocorrelation is a measure of the magnitude of the fineness and coarseness of texture.

Joint Average
Returns the mean gray level intensity of the distribution.

Cluster Prominence
Cluster Prominence is a measure of the skewness and asymmetry of the GLCM. A higher values implies more asymmetry about the mean while a lower value indicates a peak near the mean value and less variation about the mean.

Cluster Shade
Cluster Shade is a measure of the skewness and uniformity of the GLCM. A higher cluster shade implies greater asymmetry about the mean.

Cluster Tendency
Cluster Tendency is a measure of groupings of voxels with similar gray-level values.

Contrast
Contrast is a measure of the local intensity variation, favoring values away from the diagonal ( = ). A larger value correlates with a greater disparity in intensity values among neighboring voxels.

Correlation
Correlation is a value between 0 (uncorrelated) and 1 (perfectly correlated) showing the linear dependency of gray level values to their respective voxels in the GLCM.

Difference Average
Difference Average measures the relationship between occurrences of pairs with similar intensity values and occurrences of pairs with differing intensity values.

Difference Entropy
Difference Entropy is a measure of the randomness/variability in neighborhood intensity value differences.

Difference Variance
Difference Variance is a measure of heterogeneity that places higher weights on differing intensity level pairs that deviate more from the mean.

Joint Energy
Energy is a measure of homogeneous patterns in the image. A greater Energy implies that there are more instances of intensity value pairs in the image that neighbor each other at higher frequencies.

Joint Entropy
Joint entropy is a measure of the randomness/variability in neighborhood intensity values.

Informational Measure of Correlation (IMC) 1
IMC 1 = − 1 { , } IMC1 assesses the correlation between the probability distributions of iand j(quantifying the complexity of the texture), using mutual information I(x, y) This reflects how this feature is defined in the original Haralick paper. In the case where the distributions are independent, there is no mutual information and the result will therefore be 0. In the case of uniform distribution with complete dependence, mutual information will be equal to log2( )l Finally, − 1 is divided by the maximum of the 2 marginal entropies, where in the latter case of complete dependence (not necessarily uniform; low complexity) it will result in 1=−1, as = = ( , ).

Informational Measure of Correlation (IMC) 2
IMC 2 = √1 − −2( 2− ) IMC2 also assesses the correlation between the probability distributions of and (quantifying the complexity of the texture). Of interest is to note that 1= 2 and that 2− ≥0 represents the mutual information of the 2 distributions. Therefore, the range of IMC2 = [0, 1), with 0 representing the case of 2 independent distributions (no mutual information) and the maximum value representing the case of 2 fully dependent and uniform distributions (maximal mutual information, equal to log2( ).

Inverse Difference Moment (IDM)
IDM is a measure of the local homogeneity of an image. IDM weights are the inverse of the Contrast weights (decreasing exponentially from the diagonal i=j in the GLCM).

Maximum Correlation Coefficient
The Maximum Correlation Coefficient is a measure of complexity of the texture. In case of a flat region, each GLCM matrix has shape (1, 1), resulting in just 1 eigenvalue. In this case, an arbitrary value of 1 is returned.

Inverse Difference Moment Normalized (IDMN)
IDMN (inverse difference moment normalized) is a measure of the local homogeneity of an image. IDMN weights are the inverse of the Contrast weights (decreasing exponentially from the diagonal = in the GLCM). Unlike Homogeneity2, IDMN normalizes the square of the difference between neighboring intensity values by dividing over the square of the total number of discrete intensity values.

Inverse Difference (ID)
ID (a.k.a. Homogeneity 1) is another measure of the local homogeneity of an image. With more uniform gray levels, the denominator will remain low, resulting in a higher overall value.

Inverse Difference Normalized (IDN)
inverse variance = ∑ − ( ) 2 −1 =1 IDN (inverse difference normalized) is another measure of the local homogeneity of an image. Unlike Homogeneity1, IDN normalizes the difference between the neighboring intensity values by dividing over the total number of discrete intensity values.

Inverse Variance
Note that =0 is skipped, as this would result in a division by 0.

Maximum
Probability maximum probability=max( ( , )) Maximum Probability is occurrences of the most predominant pair of neighboring intensity values.

Sum Average
Sum Average measures the relationship between occurrences of pairs with lower intensity values and occurrences of pairs with higher intensity values.

Sum Entropy
Sum Entropy is a sum of neighborhood intensity value differences.

Sum of Squares
Sum of Squares or Variance is a measure in the distribution of neighboring intensity level pairs about the mean intensity level in the GLCM.

Gray Level Size Zone Matrix (GLSZM) Features, N=16
A Gray Level Size Zone (GLSZM) quantifies gray level zones in an image. A gray level zone is defined as the number of connected voxels that share the same gray level intensity. A voxel is considered connected if the distance is 1 according to the infinity norm (26-connected region in a 3D, 8-connected region in 2D). In a gray level size zone matrix ( , ) the ( , )th element equals the number of zones with gray level and size appear in image. Contrary to GLCM and GLRLM, the GLSZM is rotation independent, with only one matrix calculated for all directions in the ROI. As a two dimensional example, consider the following 5x5 image, with 5 discrete gray levels:

Small Area Emphasis (SAE)
SAE is a measure of the distribution of small size zones, with a greater value indicative of more smaller size zones and more fine textures.

Large Area Emphasis (LAE)
LAE is a measure of the distribution of large area size zones, with a greater value indicative of more larger size zones and more coarse textures.

Gray Level Non-Uniformity (GLN)
GLN measures the variability of gray-level intensity values in the image, with a lower value indicating more homogeneity in intensity values.

Gray Level Non-Uniformity Normalized (GLNN)
GLNN measures the variability of gray-level intensity values in the image, with a lower value indicating a greater similarity in intensity values. This is the normalized version of the GLN formula.

Size-Zone Non-Uniformity (SZN)
SZN measures the variability of size zone volumes in the image, with a lower value indicating more homogeneity in size zone volumes.

Size-Zone Non-Uniformity Normalized (SZNN)
SZNN measures the variability of size zone volumes throughout the image, with a lower value indicating more homogeneity among zone size volumes in the image. This is the normalized version of the SZN formula.

ZP =
ZP measures the coarseness of the texture by taking the ratio of number of zones and number of voxels in the ROI.

Gray Level Variance (GLV)
GLV measures the variance in gray level intensities for the zones.

Zone Variance (ZV)
ZV measures the variance in zone size volumes for the zones.

Zone Entropy (ZE)
Here, is an arbitrarily small positive number (≈2.2×10−16). ZE measures the uncertainty/randomness in the distribution of zone sizes and gray levels. A higher value indicates more heterogeneity in the texture patterns.

Low Gray Level Zone Emphasis (LGLZE)
LGLZE = LGLZE measures the distribution of lower gray-level size zones, with a higher value indicating a greater proportion of lower gray-level values and size zones in the image.

High Gray Level Zone Emphasis (HGLZE)
HGLZE measures the distribution of the higher gray-level values, with a higher value indicating a greater proportion of higher gray-level values and size zones in the image.

Small Area Low Gray Level Emphasis (SALGLE)
SALGLE measures the proportion in the image of the joint distribution of smaller size zones with lower gray-level values.

Small Area High Gray Level Emphasis (SAHGLE)
SAHGLE measures the proportion in the image of the joint distribution of smaller size zones with higher gray-level values.

Large Area Low Gray Level Emphasis (LALGLE)
LALGLE measures the proportion in the image of the joint distribution of larger size zones with lower gray-level values.

Large Area High Gray Level Emphasis (LAHGLE)
LAHGLE measures the proportion in the image of the joint distribution of larger size zones with higher gray-level values.

Gray Level Run Length Matrix (GLRLM) Features, N=16
A Gray Level Run Length Matrix (GLRLM) quantifies gray level runs, which are defined as the length in number of pixels, of consecutive pixels that have the same gray level value. In a gray level run length matrix ( , | ), the ( , )th element describes the number of runs with gray level and length occur in the image (ROI) along angle . As a two dimensional example, consider the following 5x5 image, with 5 discrete gray levels:

Short Run Emphasis (SRE)
( ) SRE is a measure of the distribution of short run lengths, with a greater value indicative of shorter run lengths and more fine textural textures.

Long Run Emphasis (LRE)
( ) LRE is a measure of the distribution of long run lengths, with a greater value indicative of longer run lengths and more coarse structural textures.

Gray Level Non-Uniformity (GLN)
( ) GLN measures the similarity of gray-level intensity values in the image, where a lower GLN value correlates with a greater similarity in intensity values.

Gray Level Non-Uniformity Normalized (GLNN)
( ) 2 GLNN measures the similarity of gray-level intensity values in the image, where a lower GLNN value correlates with a greater similarity in intensity values. This is the normalized version of the GLN formula.

Run Length Non-Uniformity (RLN)
( ) RLN measures the similarity of run lengths throughout the image, with a lower value indicating more homogeneity among run lengths in the image.

Run Length Non-Uniformity Normalized (RLNN)
( ) 2 RLNN measures the similarity of run lengths throughout the image, with a lower value indicating more homogeneity among run lengths in the image. This is the normalized version of the RLN formula.

RP = ( )
RP measures the coarseness of the texture by taking the ratio of number of runs and number of voxels in the ROI.

Gray Level Variance (GLV)
GLV measures the variance in gray level intensity for the runs.

Run Variance (RV)
RV is a measure of the variance in runs for the run lengths.

Run Entropy (RE)
Here, is an arbitrarily small positive number (≈2.2×10−16). RE measures the uncertainty/randomness in the distribution of run lengths and gray levels. A higher value indicates more heterogeneity in the texture patterns.

Low Gray Level Run Emphasis (LGLRE)
LGLRE = LGLRE measures the distribution of low gray-level values, with a higher value indicating a greater concentration of low graylevel values in the image.

Neighboring Gray Tone Difference Matrix (NGTDM) Features, N=5
A Neighboring Gray Tone Difference Matrix quantifies the difference between a gray value and the average gray value of its neighbors within distance δ. The sum of absolute differences for gray level i is stored in the matrix. Let be a set of segmented voxels and ( , , )∈ be the gray level of a voxel at position ( , , ), then the average gray level of the neighborhood is: Here, is the number of voxels in the neighborhood that are also in . As a two dimensional example, let the following matrix represent a 4x4 image, having 5 discrete grey levels, but no voxels with gray level 4: be the number of voxels in with gray level , be the total number of voxels in and equal to ∑ (i.e. the number of voxels with a valid region; at least 1 neighbor).
, ≤ , where is the total number of voxels in the ROI. pi be the gray level probability and equal to /