Predicting lung cancer's metastats' locations using bioclinical model

Background Lung cancer is a global leading cause of cancer-related deaths, and metastasis profoundly influences treatment outcomes. The limitations of conventional imaging in detecting small metastases highlight the crucial need for advanced diagnostic approaches. Methods This study developed a bioclinical model using three-dimensional CT scans to predict the spatial spread of lung cancer metastasis. Utilizing a three-layer biological model, we identified regions with a high probability of metastasis colonization and validated the model on real-world data from 10 patients. Findings The validated bioclinical model demonstrated a promising 74% accuracy in predicting metastasis locations, showcasing the potential of integrating biophysical and machine learning models. These findings underscore the significance of a more comprehensive approach to lung cancer diagnosis and treatment. Interpretation This study's integration of biophysical and machine learning models contributes to advancing lung cancer diagnosis and treatment, providing nuanced insights for informed decision-making.


Background
Treating cancer is a critical challenge in modern medicine, as it affects millions of people worldwide and can have devastating effects on both patient health and quality of life.Lung cancer, in particular, is one of the most prevalent and deadly forms of cancer worldwide [1].One of the major challenges in treating lung cancer is the development of metastases, which are secondary tumors that develop from cells that have spread from the primary tumor site to other parts of the organ (or even the body) [2,3,4].The location of these metastases can greatly affect a patient's prognosis and the effectiveness of treatment [5].
In particular, for lung cancer diagnosis, healthcare professionals commonly use a computed tomography (CT) and Positron Emission Tomography CT (PET-CT) to diagnose the patient's clinical condition [6].This non-invasive imaging modality allows for the simultaneous acquisition of functional and anatomic information, providing detailed insights into the metabolic activity of the tumor and its relationship to surrounding tissue.Additionally, this analysis provides the anatomy and structure of the lung and surrounding tissue, which can be used to determine the size and location of the tumor.Based on this data, clinicians look for the primary cancer tumor's as well as metastasis' properties to determine the course of treatment [7,8].
Recently, to partially automate the process of parsing the required clinical data from CT imagery, researchers have used computer vision and ML algorithms in general and to detect lung cancer, in particular [9,10,11,12,13].These algorithms can (semi-)automatically detect and classify lung nodules, reducing the dependence on human interpretation and improving the consistency of diagnoses.In addition, biophysical models gathering popularity in the clinical domain as these getting better at predicting clinical outcomes over time [14,15,16,17,18].For instance, [19] constructs a mathematical model that integrates let-7 and miR-9 expression into a signaling pathway to generate an in silico model for the process of epithelial-mesenchymal transition.The authors validate their model using in vitro data collected by testing the effects of EGFR inhibition on downstream MYC, miR-9, and let-7a expression.[20] propose a multicomponent mathematical model for simulating lung cancer growth as well as radiotherapy treatment for lung cancer, showing promising predictive accuracy compared to a relatively small set of in vivo data points.
Unfortunately, current PET-CT technology is lacking the ability to clearly capture small metastasis, usually less than 2 mm in diameter due to the mix of cancer and healthy cells together with the noise occurring during the sampling process [21].Missing small-size metastasis can lead to choosing the sub-optimal course of treatment and result in clinically catastrophic results.In this work, we propose a bio-physical model to predict from PET-CT imagery the locations of small-size metastasis that the current PET-CT methods are missing.Data-driven models are increasingly being utilized in the field of oncology [22,23].By utilizing patient-specific biological and clinical data, these models aim to provide a more comprehensive representation of the disease and its progression, thereby enabling more targeted and effective treatment strategies [24,25].The use of these models in the treatment of lung cancer holds great promise, and this study aims to evaluate their ability to predict the location of lung cancer metastases, with the ultimate goal of improving patient outcomes [26].
The novelty of this work lies in the combination of three types of bio-physical models associated with cancer settlement, flow in the bloodstream, and growth inside healthy tissue.We tested the proposed model on the historical data of 10 lung cancer patients with metastasis, obtaining a 74% accuracy in the prediction of the metastasis locations.
The remainder of the paper is organized as follows.Section 2 outlines the proposed bioclinical model, divided into the biophysical model, initial condition construction from the 3D CT image, and the metastasis probability heatmap generation process.Section 3 presents the performance of the proposed model on real-world clinical data.Finally, section 4 discusses the obtained results with their clinical applications and proposes possible future work.

Model definition
The spread dynamics of metastasis originating in the primary tumor can be associated with three main biological processes: 1) the flow of cancer cells in the bloodstream; 2) the settlement of cancer cells in the tissue; and 3) the spatial spread of cancer polyps [27,28,29].All these biological processes are spatio-temporal in nature and therefore influenced by the spatial distributions of blood vessels, healthy tissue, and the original cancer cells.To capture these settings, we use a chest PET-CT image, obtaining a three-dimensional (3D) gray-scale image (I ∈ R x×y×z ) where x, y, z are the CT image's dimensions.Using I, and by simulating the biological and clinical occurring in vivo, one can predict the locations of cancer's metastasis, if these exist.A schematic view of the model's components and the interactions between them is summarised in Fig. 1.Namely, the algorithm can be divided into three main components.First, biophysical modeling is responsible for predicting the ability of cancer cells to colonize different parts of the lungs over time.Second, the computer vision algorithm accepts the 3D CT image and produces the parameterized initial condition construction for the biophysical model.Finally, an algorithm that utilizes the biophysical model to generate the metastasis heatmap.

Biophysical modeling
To capture the biological processes occurring in vivo in a patient's lungs in the context of lung cancer migration, we define a model, M .Formally, M is a function the location of the primary tumor in the image I, and I ∈ {0, 1} x×y×z is a 3D binary image with 1 for locations where cancer cells can settle and 0 otherwise, τ ∈ R 3 is the location of interest to evaluate in the image I, and c ∈ N is the total number of cancer cells predicted to settle in location τ .
Formally, the model operates as follows.First, we find the blood vessel b s ∈ B s that is closest to the primary tumor location (p[l]) by computing the distance between the primary tumor location and each blood vessel ∀b s ∈ B s and taking the minimal value.Afterward, in the same manner, the blood vessel closest to x.These two locations are marked by S ∈ b S s and T ∈ b T s , respectively.Afterward, we find all the paths in the flow graph (G) shorter than a length ν such that starts at b S s and ends at b T s using the breadth-first search (BFS) algorithm [30].
It is known that the cancer cell population's size is reducing exponentially over time [31,32].Hence, if the absolute value of cancer cells in the cardiovascular is less than a pre-defined threshold ξ ∈ N, the population size is set to be zero and by that defines the value of ν and the stop condition for this step in the model.

Parameterized initial condition
Bio-physical model temporal prediction Hence, we proposed a prediction algorithm A that gets as an input the location of a primary tumor τ l and its size τ s , the blood vessels, and the connections between them as a graph G = (B, E), and a binary 3D tensor of healthy tissue cancer metastasis can colonize and returns a 3D tensor indicating the normalized probability that metastasis would take occur in each segment of the lungs.Namely, one can define algorithm A as follows: 3 is the location of the prime tumor in I, τ s ∈ R is the size of the primary tumor, and P (I) ⊂ R x•y•z is a distribution function allocating a probability of metastasis occurrence for each location in I.

Metastasis probability heatmap
A operates as follows, starting from the primary tumor, the tumor grows according to the model proposed by [33] until it reaches a blood vessel.At this point, cancer cells are assumed to drop from the tumor into the bloodstream at some rate d ∈ R + .The cancer cells are floating in the bloodstream according to the model proposed by [34].In a probabilistic manner, cancer cells leave the bloodstream toward the tissues around the blood vessels and try to grow into a metastasis.This colonization process is governed by the model proposed by [35].

Initial condition construction
In order to utilize the proposed bio-physical with an inputted PET-CT 3D image, one first needs to process the image to the required parameterized format that can be used as an initial condition for the model.Namely, it is required to extract the location and size of the primary tumor, the blood vessels in the geometry, and the locations of healthy tissue the cancer cells can colonize.
The first task can be achieved with relatively high accuracy using the method proposed by [36], extracting its location (τ l ∈ R 3 ) and size τ s ∈ R in I.
Next, the blood vessels are defined using a graph G = (B, E) such that E ⊂ B × B. The blood vessels graph, G, is obtained as follows.First, a search radius R ∈ R > 0 is initialized manually by the user to be R 0 .The blood vessels approximated by a cylinder geometry from a 3D CT image are obtained using the algorithm proposed by [37].We denoted the set of blood vessels to be B where b ∈ B : b := (c, h, r, o xy , o xz ) such that c ∈ R 3 is the blood vessel's center of mass, h ∈ R is the height of the blood vessel, r ∈ R is the average radius of the blood vessel, o xy ∈ [0, 2π] is the orientation in the xy axis, and o xz ∈ [0, π] is the orientation in the xz axis.For convenience, we treat B as a list that is sorted according to the h parameter, which can be obtained in O(|B|log(|B|)) operations using Quicksort algorithm [38].] which stands for the start and end locations of a cylinder blood vessel, respectively.To contract G, we first initialize it such that B are the nodes and E is an empty edge set.Now, in an interactive manner, we add edges to E with the following logic: each blood vessel, b ∈ B is checked if its start, sl, is inside a 3D sphere defined by the search radius R with either the beginning or end location of another blood vessel.If so, we add it to the edge set, E, such that its end is connected to the closest blood vessel b ∈ B in terms of ||b i sl − b j el || where b i , b j ∈ B ∧ b i ̸ = b j .If no edge is added, the search radius, R, is increased according to the formula R ← R + δ R such that δ R ∈ R + is a pre-define hyperparameter.Once G becomes a connected graph, the process holds.This is due to the fact that it is biologically known that the cardiovascular system is connected and as such, once this criterion is met, it is assumed the graph is properly constructed.Finally.we compute the maximum (in the manner of radius) spanning tree of the graph G using the method proposed by Al Mamun and Rajasekaran [39].
Finally, the 3D image is first divided into 2D images.Then, the location of the healthy tissue is obtained by using a threshold-adoptive Canny algorithm [40] to first find the edges between the inner part of the lungs and the outer one, alongside other edges in the image that are noise.In order to remove the unwanted edges, we used Hough Transform [41] to detect two ellipsoids from the edges that approximate the outer divide edges.Afterward, the edges connected to the ones obtained from the Hough Transform's ellipsoids, are added until a close polygon is obtained.The inner side of these two shapes is defined to be the healthy tissue area.After the two shapes are obtained for each 2D image, we reconstruct the entire 3D shape using the Laplacian smoothing method [42].

Metastasis heatmap generation
Hypothetically, one could solve the heatmap generation as a regression task, solving the model numerically with a given initial condition at time t 0 ∈ R and extracting the state of the model at any desired time, t f > t 0 .However, as shown in [34], computing such a dynamical system numerically is both computationally expensive and unstable.The main issue lies in the graph-based Navier-Stokes equations [43] one needs to solve.As such, to tackle this challenge, we generate the metastasis heatmap as follows.An 3D binary (I) is uniformly sampled with a grid Ω of size α x , α y , α z ∈ N for the x, y, z axes, respectively.The probability of metastasis occurrence in each point in Ω is obtained by solving the biophysical model (see Section 2.1) with the initial condition (see Section 2.2) with one modification.The blood flow component is computed for only the path in G from the initial tumor location and the tested point in the grid.Once all points in the grid are tested and the amount of cancer cells is computed for a pre-defined stop time T ∈ R + , the values are normalized using the L 1 metric to define probability.

Results
To evaluate the performance of the proposed bioclinical model, we used samples from 10 patients.All patients are diagnosed with metastases in the lungs in the Sheba Hospital (Israel) between 2019 and 2022.The metastasis instances have been manually tagged by a clinician, indicating the location in the lung and the size, alongside a manual tagging of the prime tumor's location and size.For these patients, a CT image before the detection of metastasis and afterward are obtained, without treatment in between, allowing to validate of the metastasis occurring locations without outside influence.
Table 1 outlines the result of this computation, divided into the Hard and Soft classifier scores.Overall, the bioclinical model shows 74% accuracy, on average, in predicting the location of lung metastasis from lung cancer with a 1.9% standard deviation.This indicates that the model receives stable levels of performance over the sampled population.The scores describe the accuracy metric of a Soft-classifier between two 3D images I 1 ∈ {0, 1} x•y•z and I 2 ∈ R x•y•z where I 1 is a binary image that indicates if a cell in the image is part of a cancer polyp or not and I 2 is the prediction of the proposed model (see Section 2.3): In addition, we formally define the Hard-classifier score to be

Discussion
The spread of lung cancer from its primary site to other parts of the lungs is a critical aspect of lung cancer progression and is associated with decreased survival rates [44,45].Early identification of metastatic lesions is crucial for prompt and effective treatment, as it can significantly impact patient outcomes [46].However, conventional imaging techniques such as computed tomography (CT) scans have limitations in detecting small metastases [47].In this study, we proposed a personalized biophysical-based bioclinical mathematical model that accepts a patient's 3D CT image and produces a 3D heatmap of the probability a metastasis would develop in each region of the lungs.
We tested our bioclinical model on real-world clinical data and validated it with clinician domain experts to get a baseline.As seen in Table 1, the proposed bioclinical model provides around 74% accuracy in predicting the location of metastasis.This means, that on average, three out of four predictions of the model marked locations with metastasis in the CT that would not be marked otherwise.However, these results do not provide a lower-boundary accuracy measurement of the model's performance as locations marked by the algorithm with metastasis that was without are either a wrong prediction or that metastasis was not found during the validation phase and would be created given more time.For example, let us examine a 2D slice of one of the patients in the sample.Fig. 2 shows the original CT image slice (with the manual mark of the primary tumor), the blood vessels marked in blue, the healthy tissue locations, and the model's heatmap prediction -from left to right and top to bottom.One can notice that the model predicts metastasis for the right-top corner of the lungs in this z-axis value.However, metastasis is not found there.This is because either the model is mistaken or not enough time passed from the first and second CT scans in order to allow metastasis to grow into a detected size.
One of the key advantages of the bioclinical model is the fact it is based on biophysical modeling of the cancer spread which allows one to integrate diverse data sources and provides a more comprehensive understanding of the disease compared to purely data-driven methods.Theoretically, one could aim to develop a data-driven solution to this time-series task using ML or deep learning methods.However, to do so, an extensive datagathering process of multiple CT scans of lung cancer would be required, which would be extremely expensive, logistically complex, and ethically questionable.On the other hand, the proposed approach utilizes previous biological, physical, and clinical knowledge, resulting in a need for just a relatively small dataset, used entirely to validate and evaluate the performance of the proposed model.
The results of this study demonstrate the potential of mathematical models in predicting the location of lung cancer metastases.Our findings show that the bioclinical model is capable of providing personalized predictions of metastatic spread with decent levels of accuracy.Hence, this model has the potential to inform more targeted and effective treatment strategies for lung cancer patients, ultimately improving patient outcomes.As such, this study provides a step forward in the development of more personalized and effective treatments for lung cancer.The use of bio-clinical mathematical models to predict the location of metastases holds great promise for improving patient outcomes and should be a focus of future research efforts in this field.However, our bioclinical model can (and should) be further improved.First, the currently used biomathematical model of cancer colonization on healthy tissue does not take into consideration the spatial properties of the cancer cells.Second, cancer cells continue to mutate over time which causes a wide range of outcomes and can significantly alter the course of metastasis formation.Third, taking into consideration additional clinical data about the patient such as gender, background diseases, smoking, and others, it will be possible to obtain more accurate the location of cancer metastases [48].In addition, this study was limited by its sample size and the heterogeneity of the patient population.Thus, additional large-scale, multicentered trials are necessary to further test the accuracy and clinical utility of the proposed model, to obtain more statistically representative results.

Declarations Ethical Statement
None

Figure 1 :
Figure 1: A schematic view of the model's components and the interactions between them.

Table 1 :
obtained by performing a threshold ζ on the predicted image I 2 .I.e. each value α ∈ I 2 is replaced with 1 if α > ζ and 0 otherwise).The proposed model's performance, divided into patients.