Exploring Optimality of Piecewise Polynomial Interpolation Functions for Lung Field Modeling in 2D Chest X-Ray Images

In this paper, a landmark based approach, using five different interpolating polynomials (linear, cubic convolution, cubic spline, PCHIP, and Makima) for modeling of lung field region in 2D chest X-ray images have been presented. Japanese Society of Radiological Technology (JSRT) database which is publicly available has been used for evaluation of the proposed method. Selected radiographs are anatomically landmarked using 17 and 16 anatomical landmark points to represent left and right lung field regions, respectively. Local, piecewise polynomial interpolation is then employed to create additional semilandmark points to form the lung contour. Jaccard similarity coefficients and Dice coefficients have been used to find accuracy of the modeled shape through comparison with the prepared ground truth. With the optimality condition of three intermediate semilandmark points, PCHIP interpolation method with an execution time of 5.04873 s is found to be the most promising candidate for lung field modeling with an average Dice coefficient (DC) of 98.20 and 98.54% (for the left and right lung field, respectively) and with the average Jaccard similarity coefficient (JSC) of 96.47 and 97.13% for these two lung field regions. While performance of Makima and cubic convolution is close to the PCHIP with the same optimality condition, i.e., three intermediate semilandmark points, the optimality condition for the cubic spline method is of at least seven intermediate semilandmark points which, however, does not result in better performance in terms of accuracy or execution time.


INTRODUCTION
The chest X-ray imaging is still one of the most preferred techniques that radiologists and medical practitioners use to diagnose the lung diseases in their daily routine checkups due to its low cost and easy availability. Due to this reason, the accurate detection and segmentation of the lung field region are of prime importance for any biomedical image analysis procedures [1][2][3]. Delineation of the lung field is a prerequisite for any chest-image analysis procedure. However, delineation is a very tedious and time-consuming procedure that may be prone to subjective bias. Therefore, an automated solution for the lung field segmentation is needed [4]. The development of an automated solution for the lung field segmentation is challenging due to intensity variations across the edges and overlapping of the other anatomical structures. As chest X-rays are low contrast images, the lung regions cannot be differentiated from the background and hence the classical approaches like thresholding and edge detection algorithms are not sufficient enough for the lung field segmentation. The author in Ref. [5] provided an intuitive method to solve this problem by representing a shape as a set of discrete labeled points and referred to these points as landmarks. Each labeled point represents a particular part of the shape or its boundary and captures certain distribution in the shape space. But in practice, it is very time-consuming. Labeling a set of landmark points to model a shape in the shape space is referred as point distribution model (PDM) [6,7]. The modeled shape takes a considerable amount of landmark points to form a curve. Thus, the point distribution model inevitably reduces the number of points that may use to represent a shape. Clearly, these labeled sets or the number of landmark points do not represent any salient feature of anatomical significance, but a part of the curve or boundary of the anatomical shape. However, this method (PDM) has some undesirable consequences. First, users are forced to put many landmark points to smooth out the curve in every training example. Second, the landmark points may lose their characteristics as true landmark point as they do not represent any salient feature of the object. The above said problem can be minimized by taking only a few landmark points that must characterize some anatomical significance of the lung field region. These significant anatomical landmark points are then interpolated to get a close approximation of the original lung field shape. The accurate delineation of the lung field requires the anatomical knowledge of chest radiography to incorporate the expected shape that may work as a priori information in active shape modeling (ASM).
To delineate the lung fields, the author in Ref. [8] used 50 landmark points for the left lung and 44 landmark points for the right lung representation. They also used 26 landmark points for the heart shape representation and a pair of 23 landmark points for the left and right clavicle's (left and right) representation. Each object and landmark point was defined by manual annotation, and no interpolation method was employed. The author in Ref. [9] proposed a lung segmentation methodology by capturing salient points around the lung fields by subsequent application of simple intensity and edge feature extraction techniques. The detected salient points are then interpolated using Bezier curves to approximate the lung field boundaries. In Ref. [10], Shao et al. presented a joint shape and appearance sparse learning method to segment out lung field from the chest radiographs. They used a total of 14 labeled points (6 for the right lung and 8 for the left lung) to get a rough idea of both the lung regions. They termed these labeled sets as "landmarks." Few more points were also annotated between these landmark points, and they termed these labeled sets as "points." However, the authors did not provide any information about the number of points they used to represent the lung field regions. The author in Ref. [11] presented a customized active shape model to extract the lung regions from the chest X-ray images. They employed an average active shape model, gray scale projection, and affine registration to obtain the initial lung contours. After that, an objective function is defined to push the vertices of the active shape model to the real lung edges to get a more balanced distance distribution of vertices. They used 44 and 50 landmark points to represent the left and the right lung regions for the lung field segmentation. The annotated set of landmark points was manually defined and interpolation was not employed. The author in Ref. [12] presented an automatic lung field segmentation method using an improved statistical shape and appearance model. They used 6 landmark points to locate each lung region and then applied a gray-level intensity based method to locate and initialize the lung shape model. They used the intensity profile model to create boundary landmark points and later these landmark points were interpolated by a cubic spline interpolation method. In Ref. [13], the authors used the set of labeled points to train their algorithm for the automatic lung field segmentation using active shape modeling (ASM) in the singlephoton emission computerized tomography (SPECT) images and validated this automatic SPECT segmentation against computed tomography (CT) images as well as manually delineated SPECT images. But the method does not explain the type of interpolation mechanism that they used. In Ref. [14], the authors used 144 annotated boundary points (72 per left/right lung) to construct a statistical shape model of the lung field. Specifically, they used six manually annotated primary landmark points based on the appearance of the lung field and then secondary landmarks were estimated along the lung contour using interpolation between the manually annotated primary landmarks. However, it is noted that the other anatomical structures that overlap with the lung field region were not excluded in their method, and the type of the interpolation method was not defined. All these authors used the landmark selection method to segment out lung field regions from the chest radiographic images. However, these authors could not explain the number of interpolating points (secondary landmarks) that is needed to approximate the lung field regions with the highest similarity index.
The landmarking method is also employed to delineate other anatomical structures like the heart, liver, femur, etc., in CT, SPECT, and other medical imaging techniques. The authors in Ref. [15] reported a general multi-resolution framework for the statistical modeling of multi-object structures. The authors in Ref. [16] proposed a dual active shape model for the segmentation of heart's right ventricle (RV) boundary. The authors in Ref. [17] presented an automatic liver segmentation method using the shape modeling methods in computed tomography (CT) images. The authors in Ref. [18] used landmark based method for extracting prostate boundaries from the 2D transrectal ultrasound (TRUS) images by using a partial active shape model (PASM). The authors in Ref. [16] introduced an ASMbased framework for motion correction of myocardial T1 mapping in magnetic resonance imaging (MRI). Clearly, landmarking is the first stage of any automated computer aided diagnostic (CAD) system that tries to identify the anatomical region. The landmark based method's major strength is that it represents a shape as a set of discrete labeled points. The landmark based method is highly applicable for the cases where the visibility of the boundary is obscured or cannot be differentiated from the background, which is always the case of low contrast imaging like X-ray and ultrasound imaging. The landmark based point distribution model finds numerous applications in active shape modeling Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 770752 (ASM), active appearance modeling, deep learning (DL), and machine learning (ML) models for the registration and segmentation applications in medical imaging. However, there are some limitations of this method that preclude their accurate clinical application. The model based methods assume that the shape information and intensity information between the training images and the new image to be segmented are similar to each other. Unfortunately, this assumption can often be invalid due to the variability of the lung's anatomy among the individuals in the application stages.
In this paper, a set of landmark points based on the anatomical properties of the left lung region and right lung field regions are explored. Our contributions in this paper are 1) identification of the number of anatomical landmark points in the thorax region for the left and right lung field modeling, 2) identification of the optimality condition of each of the interpolating polynomial to model left and right lung field region, and 3) identification of the most suited interpolating polynomial that models the left and right lung field region with highest similarity index by producing least number of secondary landmark points. Exploring the optimality of piecewise polynomials has an added advantage over selecting a random number of semilandmark points. It minimizes the number of interpolating points and hence reduces the computational complexity of the processing algorithm. Knowing the minimum number of interpolating points is highly beneficial in real-time medical imaging applications requiring less computational complexity. However, one disadvantage of landmark based study is that the number of available landmark points can sometimes be insufficient to capture the shape of an object.
The remainder of this paper is organized as follows: Section 2 provides the mathematical foundation of the lung shape modeling and Section 3 discusses the interpolation methods that mainly highlight the different categories of piecewise cubic interpolating polynomial that we have used in this literature. Section 4 discusses the methods under which the images are landmarked, and, finally, Section 5 and Section 6 discuss the simulation results and conclusion, respectively.

LUNG SHAPE MODELING
Let each shape in the training set be represented by a set of K number of landmark positions that must be consistent from one shape to the next. Here, the consistency employs that each particular landmark point must be placed on the same site as it was on the first image. That means, each particular landmark point represents a specific location of the lung field boundary.
To model a lung field shape, let each annotated landmark point be represented by the coordinate position (x j , y j ) in the two dimensional space R 2 . If each shape is annotated by K number of landmark points, then the modeled shape in terms of coordinate positions for the ith shape is defined by [19] These coordinate position S i ((x i1 , y i1 ), (x i2 , y i2 ), . . . , (x ik , y ik )) in a shape vector form can be represented as that is further rearranged as a set of 2K vector: If there are L number of training examples, then the configuration matrix S ∈ R 2K×L can be written as

INTERPOLATION METHODS
Modeling or building a shape requires the function to be continuous. This creates a real problem if the modeling function consists of a set of discrete data points (landmark points). Therefore, it requires to construct a continuous function based on discrete data points. Here, an attempt is made to examine different interpolation techniques that fit the shape in a continuous manner from a set of discrete landmark points. Interpolation helps to construct a continuous function from a set of discrete data points. It is to be noted that, these interpolation techniques [20] only provide the conditions under which the lung field regions are modeled. The methods do not provide any solution to the explained techniques. Given K number of landmark points (x j , y j ) in the 2D-plane R 2 , the interpolating polynomial is defined as where x (x 1 , . . . , x k ) are the interpolation points and P(x j ) is the interpolating polynomial of the landmark point (x 1 , y 1 ), (x 2 , y 2 ), . . ., (x k , y k ). Solving a fitting polynomial P(x j ) that fits the landmark point (x j , y j ) is equivalent to solving a system of linear equation where A is a Vandermonde matrix.

Piecewise Polynomial Interpolation
A quite natural and different approach to approximate a function on an interval is to first split the interval into subintervals and then approximate the function by a polynomial of fairly low degree on each subinterval. Given K number of landmark point (x j , y j ) ∈ R 2 with the condition x 1 < x 2 < / < x k , the one dimensional piecewise interpolant in terms of piecewise function can be defined as x ∈ x j , x j+1 (10) where P j (x) is at least continuous everywhere in [x j , x j+1 ]. Continuity condition should hold at every data point, A piecewise polynomial function is defined as Hence, the piecewise polynomial interpolation problem is to determine the coefficients a j r for all of the intervals such that the resulting interpolant has desirable properties.

Linear Interpolation
Piecewise linear interpolation [21] is by far the most popular interpolation technique that finds numeral application in signal and image processing due to its faster performance.
If t is the local variable given by t The piecewise linear interpolation is then piecewise straight lines connecting two consecutive points of the interval (x j , x j+1 ).
or P(x) y j + tδ j (15) clearly, this equation represents a function of straight line that passes through the landmark positions (x j , y j ) and (x j+1 , y j+1 ).

Cubic Convolution Interpolation
The cubic convolution interpolation [22][23][24] function is obtained by imposing certain conditions on the interpolation kernel. The kernel is mainly composed of piecewise cubic polynomials defined over the unit subintervals [−2, 2]. For equally spaced data, the interpolation function can be defined as where c j are the coefficients to be determined and depend on the sampled data, u j is the kernel basis function, and h is the sampling interval. The cubic convolution interpolation is obtained by setting certain conditions on the kernel to maximize the accuracy. Keys [22] defined the cubic convolution interpolation kernel in terms of piecewise cubic polynomials over the subintervals (−2, −1), (−1, 0), (0, 1), and (1,2). Outside this interval, the interpolation kernel is assumed to be zero. By imposing this condition, the number of data samples that evaluates the interpolation function is reduced to four. Therefore, the kernel u will have the form: with u(0) 1 and u(n) 0 where n is a nonzero integer. As h is the sampling interval between two nodes, the difference between the two interpolating nodes say x j and x r will be (j-r)h. Now, if x is substituted by x j in Equation 16, then Eq. 16 will take the form: Now, if x r x j+1 , then Eq. 19 will have the form:

Cubic Spline Interpolation
Among the spline functions, the cubic spline functions [25][26][27][28] are the most preferred functions. The cubic spline functions fit the data very smoothly. More importantly, they do not have an oscillatory behavior that is common for higher order degree polynomials associated with interpolation, as the cubic Lagrange interpolation polynomial produce. A cubic spline is defined by where x j−1 ≤ x ≤ x j for j 1, 2, . . . , K. Clearly, the above equation contains four unknowns for each spline, a j , b j , c j , and d j , for a total of 4K unknowns over the whole interval.
The cubic spline interpolation must have a second order derivative and should satisfy the continuity condition over the interval [x j , x j+1 ]. This requires that P(x j ), P′(x j ), and P′′(x j ) are continuous over the interval [x j , x j+1 ]. To find the interpolating function, the coefficients a j , b j , c j , and d j must be determined for each of the cubic function. For K number of landmark points, there will be ( −1) cubic functions and each cubic function requires four coefficients. Hence, there is a total of 4(K−1) unknowns. So, to get all the coefficients 4(K−1), independent equations are required. To get these coefficients, certain conditions need to be assumed. The first two conditions for each spline are as follows: 1) The piecewise cubic function Q(x) must intersect each and every landmark data points (left and right). This requires Q(x j ) y j , j 1, . . . , (K − 1) 2) Moreover, Q(x) must be continuous on the interval [x 1 , x k ] which conclude that each sub-function must join at the landmark data points, i.e., Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 770752 These two assumptions give 2(K−1) equations. As each cubic function has to join smoothly to its nearest neighbors, the first and second derivatives are constrain to be continuous. 3) Q′(x) must be continuous on the interval [x 1 , x k ] to make the curve smooth across the interval, i.e., 4) Q"(x) will be continuous on the interval [x 1 , x k ]. Therefore, which gives us 2(K−2) equations. Hence, we get a total of (4K−2) equations and is therefore under-determined. In order to get a cubic spline function, two other conditions must be imposed to get 4K equations. As our purpose is to get a closed contour of the lung field, the following periodic constraints are imposed (also known as periodic conditions). 5)

Piecewise Cubic Hermite Interpolation
Piecewise cubic Hermite interpolation polynomial (PCHIP) [29,30] is a third order polynomial which has a shape preserving characteristic by matching only the first order derivatives at the data points with their neighbors (before and after) [31]. This characteristic makes it differ from the cubic spline function. If h j is the length of jth subinterval given by then the divided difference δ j will be δ j y j+1 − y j h j If ζ j is the slope of the interpolant at x j , then In shape preserving PCHIP function, the idea is to restrict the overshoot locally by determining the slope d j .
It is to be noted that, if δ j and δ j−1 are of opposite signs or either of the term is zero, then x j will be a discrete local minimum or discrete local maximum. So, we constrain the ζ j to be zero, i.e., ζ j 0, and, if δ j and δ j−1 are of the same sign and are of the same interval size, then ζ j is calculated using harmonic mean However, if δ j and δ j−1 are of the same sign but of different interval lengths, the δ j will be a weighted harmonic mean lead by the expression where w 1 2h j + h j−1 and w 2 h j + 2h j−1.

Makima Piecewise Cubic Hermite Interpolation
The Akima interpolation [32] between the interval [x j , x j+1 ] mimimizes the wiggling by selecting the derivatives as a linear combination of nearest slopes where δ j yj+1−yj xj+1−xj is the slope of the interval [x j , x j+1 ]. For any landmark point x j , the Akima takes five neighbors landmark point x j−2 , x j−1 , x j , x j+1 , and x j+2 to calculate the Akima derivative. However, the Akima interpolant function suffers from two major problems. First, if lower and upper slopes become equal, i.e., δ j−2 , δ j−1 , and δ j δ j+1 , both the numerator and denominator become zero and hence Akima derivative will have no solution. Second, the Akima interpolant produces overshoot or undershoot when the data are constant for the two consecutive nodes. To overcome these two problems, a modified Makima interpolation was introduced [33].

METHODS
In our method, the radiographs which are anatomically similar in all aspects are chosen to study the performance of the different interpolating polynomials in modeling of the lung field region. A publicly available JSRT dataset [35] has been used to study performance of each interpolating polynomial.
1) JSRT dataset: this dataset contains 247 posteroanterior (PA) chest X-ray images compiled by the Japanese Society of Radiological Technology (JSRT). Out of the 247 chest X-ray images, set of 154 images has lung nodules (100 malignment cases, 54 benign cases) and set of 93 images has no lung nodules. These images are of the size of 2048 × 2048 pixels with a gray scale Similarly, the landmark points defined for the right lung are grouped into the set of right coastal edge (1R-6R), right apical region (6R, 7R, 9R, 10R, and 11R), superior vena cava (11R-13R), heart's right ventricle boundary (13R-15R), and right hemidiaphragm (15R, 16R, and 1R) for the right lung field region. Here, the landmark points 7L and 9L are used to represent the left clavicle in combination with the landmark points 6L and 10L. Similarly, the landmark points 8R and 10R in combination with 7R and 11R are used to represent the right clavicle region. These two regions, the left clavicle and the right clavicle (each of which includes four landmark points), are included mainly for the study of landmark based postprocessing. It may be noted here that the region above the lung apices, region below the hemidiaphragms, and the mediastinum that encloses the heart, superior vena cava, descending aorta, etc., are not included in the present study. A side by side comparison of one lung region to the other gives important information about the shape dissimilarity like lung's contraction or expansion that helps to analyze different lung diseases [36,37]. Subdividing these regions into different segments may give a better analysis of the diseases. To make these regions independent, the lung regions can be subdivided into three different regions, namely, the apex and medial and lower regions, by defining few co-linear landmark points at the  Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 770752 10 coastal edges. To fulfil these criteria, the landmark points 2L, 3L, 4L, and 5L that belong to the left coastal edge are made colinear with the landmark points 16L, 15L, 14L, and 10L, respectively. Similarly, the landmark points 2R, 3R, 4R, 5R, and 6R that belonging to the right coastal edge are made colinear with the landmark points 15R, 14R, 13R, 12R, and 11R, respectively.

Evaluation Metrics
In order to compare the different interpolation techniques, two different performance measures, namely, Jaccard similarity coefficient and Dice similarity coefficient [38], are used that evaluate the quality of delineation and interpolation. 1) Jaccard similarity coefficient (JSC): the Jaccard similarity coefficient or Jaccard index (J) between the ground truth shape and the interpolated shape is defined as 2) Dice coefficient (DC): the Dice coefficient between the ground truth shape and interpolated shape is defined as where S GT is the region enclosed by the ground truth shape and S IS is the region enclosed by the interpolated shape. The Jaccard coefficient and Dice coefficient between the two shape instances give us an idea of how similar the two sets are. The JSC and DC take a value between [0, 1]. The zero indicates that the two shape instances do not coincide with each other, whereas one indicates that the two shape instances completely coincide with each other.
3) Execution time: estimating the execution time often becomes mandatory when evaluating the performance of an algorithm. Knowledge about the execution time of program is of utmost importance in selecting an appropriate method that models the lung field shape within a specified amount of time. The polynomials that take a longer time than the specified amount of time cannot be preferred for the lung field modeling.

SIMULATION RESULT
In this work, five different interpolating polynomials are studied for the left and right lung field modeling using a set of discrete labeled points called anatomical landmark points. For this purpose, three similar radiographs from the publicly available JSRT dataset are selected. We identified and selected 17 anatomical landmark points for the left lung region and 16 anatomical landmark points for the right lung region in the selected set of images, as shown in Figure 1. As the selected landmark points are not sufficient to form the lung contour, piecewise interpolating polynomials are used to create additional intermediate semilandmark points between each pair of the consecutive landmark points. Our intention is to get a shape of highest similarity index by interpolating minimum number of the secondary landmark points (i.e., intermediate semilandmarks). Hence, an analysis is made to find a shape  Figure 2 shows the lung shape modeling of the image data set JPCLN001 using selected interpolating polynomials with linear, cubic convolution, cubic spline, PCHIP, and Makima interpolation methods, respectively, using one, three, and ten intermediate semilandmark point(s). A similar attempt is also made to represent two other sets of images in Figures 3, 4 with the selected interpolating polynomials using one, three, and ten secondary landmark point(s). Here, red and green contours are used to represent ground truths and lung field boundaries obtained using different interpolating polynomials, respectively. Performance of each interpolation method is evaluated, for the left and right lung field modeling, against the number of intermediate semilandmark points, in terms of Jaccard similarity coefficient and Dice coefficient. Figure 5 shows the performance of each interpolating polynomial in terms of Jaccard similarity coefficient and Figure 6 is used to represent their performance in terms of Dice coefficient. A tabular form of the different interpolating polynomials for which Jaccard's and Dice's coefficients remain optimal with the minimum required condition is shown in Table 1. Here, optimality refers to a situation in which JSC and DC attain the best or most favorable value beyond which no such significant change is sought. The optimality condition refers to a condition that is required (in terms of the minimum number of intermediate semilandmark points between each consecutive anatomical landmark pair) for the JSC and DC to attain the best or most favorable value. The execution time of these interpolating polynomials is evaluated for the three intermediate semilandmark points and is shown and compared in Figure 7. The simulation work is carried out using MATLAB R2018b installed under the Fedora Linux kernel version 5.6.13-300.fc32.x86_64 in HP ENVY 15-k004tx Notebook PC with the configuration of 1.7 GHz Intel Core i5-4210U processor having Intel HD Graphics 4400 and 8 GB of RAM.

CONCLUSION
Here, we have presented an effective method of anatomical landmark point selection and their minimization and modeling of the lung field shape using five different interpolation techniques, namely, linear, cubic convolution, cubic spline, PCHIP, and Makima. Each interpolation method is applied locally with a certain number of intermediate semilandmark point(s) between each consecutive anatomical landmark pair. We measured and compared the modeling performance of each interpolation technique with the prepared ground truth in terms of Jaccard similarity coefficient (JSC) and Dice coefficient (DC). The modeled shape using linear interpolation method with an execution time of 4.97954 s ensures a shape of minimum similarity index (with an average JSC of 95.36 and 95.65% and with an average DC of 97.62 and 97.78% for the left and right lung fields, respectively) and has no impact of increasing the number of intermediate semilandmark points. Therefore, optimality condition for the linear interpolation method cannot be defined. However, for PCHIP and Makima interpolation methods, an incremental change in JSC and DC is observed as the number of intermediate semilandmark points between each consecutive anatomical landmark pair increases from one to three intermediate semilandmark point