Log-Linear Model and Multistate Model to Assess the Rate of Fibrosis in Patients With NAFLD

In this paper, the deleterious effects of obesity, type II diabetes, and insulin resistance, systolic and diastolic hypertension on the rate of progression of fibrosis in patients with non-alcoholic fatty liver disease (NAFLD) are illustrated using a new approach utilizing the Poisson regression to model the transition rate matrix. The observed counts in the transition count matrix are used as the response variables and the covariates are the risk factors for fatty liver. Then, the estimated counts from running the Poisson regression are used to estimate the transition rates using the continuous-time Markov chains (CTMCs) followed by exponentiation of the estimated rate matrix to obtain the transition probability matrix at specific time points. A depicted, hypothetical, observational, prospective longitudinal study of 150 participants followed up every year for a total of 29 years recording their demographic characteristics and their timeline follow-up is demonstrated. The findings revealed that insulin resistance expressed by HOMA2-IR had the most deleterious effects among other factors on increasing the rate of fibrosis progression from state 1 to state 2, from state 2 to state 3, and from state 3 to state 4. The higher the level of HOMA2-IR is, the more rapid the rate of progression is. This analysis helps the health policymakers and medical insurance managers to allocate the financial and human resources for investigating and treating high-risk patients with NAFLD. In addition, this analysis can be used by pharmaceutical companies to conduct longitudinal studies to assess the effectiveness of the newly emerging anti-fibrotic drugs.


INTRODUCTION
Continuous-time Markov chains (CTMCs) are valuable mathematical and statistical tools. They are of great potential to evaluate the disease progression over time. NAFLD is an increasingly worldwide epidemic, paralleling the rise in the incidence of obesity and type II diabetes which are approaching a pandemic level. This emerging health problem is mainly due to sedentary life styles and western eating habits of ingesting high-fat and cholesterol diets. The pathological milestone for NAFLD is insulin resistance and hyperinsulinemia. This hyperinsulinemia will eventually result in type II diabetes with adverse complications like vascular diseases and fatty liver disease. On the other hand, NAFLD can cause type II diabetes, as the prevalence of diabetes in NAFLD ranges between 18 and 45%. Moreover, the prevalence of NAFLD in type II diabetic patients ranges between 49 and 75% [1].
Non-alcoholic fatty liver disease can be modeled using the simplest form for health, disease, and death model. It is composed of four states. One state is for susceptible individuals with risk factors like type II diabetes, dyslipidemia, obesity, and hypertension. The second state is the NAFLD phenotypes. The other two competing states for death are: one for liver-related mortality as a complication of NAFLD and the other state is the death causes unrelated to liver disease [2]. This model is shown in Figure 1.
In addition, NAFLD can be modeled in more elaborate expanded form which includes nine states [3]. The first eight states are the states of disease progression over time and the ninth state is the death state [2], as illustrated in Figure 2.
Moreover, fibrogenesis is a dynamic process that goes back and forth among the early stages of the expanded model. Stages of fibrous tissue formation are early seen in NAFLD process. Fibrosis progresses if the risk factors for its formation are not eliminated. Fibrosis is an ominous sign for loss of liver functions. When the fibrous tissue develops, a subset of the early states is used to relate these risk factors to the rates. Definition of each state is shown in Figure 3 [4,5]. F0 indicates that there is no fibrous tissue. F1 means that fibrous tissue is Abbreviations: CC, compensated cirrhosis (stage 4); CTMC, continuous-time Markov chains; DCC, de-compensated cirrhosis (stage 5); EM, extramortality (stage 9); HCC, hepatocellular carcinoma (stage 8); LT, liver transplant (stage 6); NAFLD, non-alcoholic fatty liver disease; NAFL-NO FB, non-alcoholic fatty liver with no fibrosis (stage 1); NASH, non-alcoholic steatohepatitis; NASH-NO FB, non-alcoholic steatohepatitis with no fibrosis (stage 2); NASH-FB, non-alcoholic steatohepatitis with fibrosis (stage 3); PLT, post-liver transplant (stage 7); T2DM, type 2 diabetes mellitus.
FIGURE 1 | General model structure [2]. deposited due to non-alcoholic steatohepatitis (NASH) and not due to any other causes of liver disease; all other stages (F2 and F3) are maintained and are progressing over time by the presence of NASH till the liver cirrhosis (F4). If this NASH is well-treated by controlling the risk factors that induce it, the fibrous tissue formation and deposition will regress as shown in the Figure 3.
Kalbfleisch and Lawless [6] related the instantaneous rate of transitions from state i to state j to covariates, by regression modeling of the Q transition rate matrix using log-linear model for the Markov rates.
The previous studies, as will be later mentioned in the discussion, mainly included the evaluation of 2 paired biopsies, initial and second biopsies, then grouping the patients according to the findings into stable, regressors, slow progressors, and rapid progressors without precise estimation of specific transition rates among states and without proper estimation of the predictive value of each variable on these specific rates. The rate of fibrosis progression was estimated by dividing the difference in fibrosis stage between biopsies by the time interval (in years), and this was performed to account for the time differences between the biopsies [7]. Additionally, either univariate or multivariate linear regression was used to relate the risk factors with the rate of progression. As will be later mentioned in the discussion, some studies utilized multivariate logistic regression instead of linear regression.
factors. Third, it recommends using continuous-time Markov chains to obtain the transition probabilities and predict the expected counts of patients in each state at a specific time point in the future. The counts of each transition can be modeled as a function of some explanatory variables reflecting the characteristics of the patients. The Poisson regression model specifies that each response y i is drawn from a Poisson population with parameter λ i , related to the covariates. The primary equation of the model is The most common formulation for the λ i is the log-linear model: where β is the k × 1 parameter vector, m is the number of predictors, and Xs are the predictors. The expected number of events per period is given by: The observed counts in the transition counts matrix are used as response variables. The covariates are the risk factors for the fatty liver, where the participants are subjected to the same follow-up periods. Then, the estimated counts obtained from running the Poisson regression are used as input to estimate the transition probability matrix using the CTMC. The author clarifies this procedure using a hypothetical example in the form of an observational prospective longitudinal study. Attia [8] used the same data in previous work. Still, in this article, the author discusses the issue of multicollinearity, the equidispersion Poisson of response variables in the presence of excess zeros, and more comparisons between this work and previous works. Finally, the author highlights the benefit of such analysis to pharmacoeconomic evaluation and healthcare economics.

Statistical Analysis
The relationship between the response variable (counts of transitions) and the predictors was non-linear as shown by Lowess smoother. Restricted cubic spline was used to obtain a suitable functional form of the predictors to fit a Poisson model using STATA 14. The CTMCs were used to obtain transition probability matrix and transition rate matrix. p-Value of <0.05 was considered statistically significant; all tests were two-sided tests (refer to Appendix A).

RESULTS
Summary of the transition counts among the states is shown in Table 1. The observed counts of the participants over the 29 years of follow-up are demonstrated in Table 2. The distribution of these counts was Poisson (mean = variance). The dispersion indices for the nine response variables ranged between 0.82 and 1.34. In Appendix B, more figures illustrate the dispersion of these response variables. They were also correlated with high statistical significance (p-value = 0.000) as shown in Table 3. Initial observed rates are as follows: The application of Lowess smoother showed the nonlinear relationship between the predictors and the response variables as shown in Figure 4. In Supplementary Materials, more figures illustrating these relationships between the different predictors and response variables are clearly shown (refer to also Appendix B).
The continuous predictors (age, BMI, HOMA2-IR, LDL-chol, and systolic and diastolic blood pressure) were highly correlated with a correlation coefficient of 0.99 and a condition number for data matrix (X'X) of 453.57. The condition number for the data matrix (X'X) constructed from the transformed variables used in the analysis (HOMAsp1, HOMAsp2, LDLsp2, sysPS2, diasPS2) was 54.89. These transformed variables were also highly correlated. However, the condition number did not exceed 100. Thus, this multicollinearity can be considered non-harmful, and it will not affect the analysis [9].
The observed counts were the response variables used to fit the Poisson regression model. For each transition count, the model that represented the most explainable covariates with their estimated beta coefficients and the corresponding incidence rate ratios were illustrated in Appendix B. The transitions were subdivided into progressive transitions and regressive transitions. The main important result is that HOMA2-IR is positively correlated with all progressive transitions and is inversely related to the regressive transitions, with control of other variables, as shown in Tables 4, 5.  Progressive Transitions With Rates λ 01 , λ 12 , λ 23 , λ 34 HOMA is 6.174, which is not highly statistically significant (p = 0.046). The expected increase in log count of transition from F3 to F4 for one-unit increase in transformed HOMA is 10.866, which is highly statistically significant (p = 0.000).
Regressive Transitions With Rates µ 10 , µ 21 , µ 32 , µ 20 , µ 31 Persons with high insulin resistance (elevated HOMA2-IR) had 0.011 times the rate of transition from F1 to F0 compared to persons with normal level of HOMA2-IR (persons with normal insulin sensitivity), also the rate decreased to 0.037 times for the rate of transition from F2 to F1, decreased to 0.005 times for the rate of transition from F3 to F2, decreased to 0.066 times for the rate of transition from F2 to F0, and decreased to 0.084 times for the rate of transition from F3 to F1. Statistically speaking, the expected decrease in log count of transition from F1 to F0 for one-unit increase in transformed HOMA is 4.489, which is not statistically significant (p = 0.13). The expected decrease in log count of transition from F2 to F1 for one-unit increase in transformed HOMA is 3.288, which is not statistically significant (p = 0.242). The expected decrease in log count of transition from F3 to F2 for one-unit increase in transformed HOMA is 5.214, which is not statistically significant (p = 0.103). The expected decrease in log count of transition from F2 to F0 for one-unit increase in transformed HOMA is 2.713, which is highly statistically significant (p = 0.000). The expected decrease in log count of transition from F3 to F1 for one-unit increase in transformed HOMA is 2.476, which is highly statistically significant (p = 0.000).

Validation and Residual Analysis
Poisson model fitted the data. When comparing the full model to the null model, there was a marked decrease in the deviance goodness of fit. Also, the akaike information criteria (AIC) and bayesian information criteria (BIC) were less than their values in the null model, indicating the full model improvement. In addition, there was an increase in the pseudo-R 2 , indicating the ability of the model to predict the outcome better than the null model. The output results of the null model for each of the transition counts are shown in Tables 6, 7.
The observed rates were approximately equal to the estimated rates after running the Poisson model as shown in Table 8.
Analysis of residuals especially Pearson residuals, for all transitions, revealed that they were not normally distributed. The Q-Q plot for these residuals did not exhibit normality. The Pearson dispersion statistics for each count was less than one supporting no evidence of overdispersion of the fitted model despite the apparent excess zeros (Appendix B, Table  21). Generalized Poisson regression did not fit the data. In Appendix C, more figures of these residuals are presented [10,11].
This observational study aims to obtain preliminary and explanatory ideas about the effects of each risk factor on the different transition counts among the states. This Poisson regression is not aiming for future prediction of counts. Although the residuals are not normally distributed, such analysis can give fair provisional ideas about the effects of the risk factors. The Poisson model gives unbiased estimates for the regression coefficients, but these coefficients' statistical significance should be cautiously taken.

CTMCs Utilize the Estimated Counts From Log-Linear Model to Obtain the Transition Probability Matrix
For each of the transitions from state (i) to state (j), where λ ij denotes the counts of transition from state (i) to state (j), and after running the Poisson model, the linear predictor ln λ ij = X ′ n B for each participant (n) is exponentiated, E y n X n = λ ij = exp X ′ n B , to obtain the expected counts of transition that this participant had accomplished during this 29 years. Then, the result is rounded to the appropriate integer and summed to get all counts for this transition and then compared to the observed counts accomplished by all participants.
The n i+ is the total marginal transition counts out of this state, which is assumed to be constant. The estimated counts from running the Poisson model will be substituted in the transition count table. Because the marginal counts are assumed to be the same and when using the initial rates calculated as θ 0 = n ij n i+ where the n ij is the transition counts from state i to state j, the Q matrix can be estimated. (Hint: the numerators below are the estimated counts obtained from running the Poisson regression).    The probability matrix at any specific time point in the future can be obtained by exponentiation of this matrix because the chain is homogenous continuous-time Markov chains with constant rates over time. This result can also be obtained by solving the forward Kolmogorov differential equations, which will yield the same result as the exponentiation of the estimated Q matrix (refer to Appendix D).
The transition probability matrix is obtained by exponentiation of this estimatedQ matrix after 1 year:

Goodness of Fit for the Multistate Markov Model
To calculate goodness of fit for multistate model used in this example, it is like the procedure used in contingency table: Step 3: Calculate the expected counts in this interval.
n 1+ = 2050, n 2+ = 1247, n 3+ = 783, n 4+ = 120 Multiplying each row in the probability matrix with the corresponding total marginal counts in the observed transition counts table in the same interval yields the expected counts as shown in Table 9.
Step 4: The observed counts, O ij , are shown in Table 2. The expected counts, E ij , are obtained from the previous step and are shown in Table 9. Then, apply the Pearson statistic formula which yields a value of 1,140.097 with high statistical significance (p = 0.000). Apply So, from the above results, the null hypothesis is rejected while the alternative hypothesis is accepted and the multistate Markov model fits the data, that is to mean, the future state depends on the current state with the estimated transition rates and probability matrices as obtained.

Health Economics
This transition probability matrix can predict the count of patients in each state at specific time point, for example, if a cohort of 6,000 patients with the following number in each state is 3000 1800 1020 180 0 , after 1 year the predicted counts will be 2895 1879 1044 154 28 . This count can be achieved by multiplying the initial count distribution of the patients with the transition probability calculated at the required specific time Let u (0) be the size of patients in a specific state at specific time t = 0. The initial size of patients is U (0) = u j (0), as there are 4 transient states (F0 to F3) and 1 absorbing state (F4), where u j (0) is the initial size or the number of patients in state j at time t = 0 given that u 5 (0) = 0, i.e., initial size of patients in state 5 (absorbing state) is zero at initial time point = 0. As the transition or the movement of the patients among states is independent, at the end of the whole time interval (0, t), there will be u j (t) patients in the transient states at time t, and there will also be u 5 (t)patients in state 5 (F4 = liver cirrhosis) at time t.
In addition, the state probability distribution π (t) , which is the probability distribution for each state at a specific time point given the initial probability distribution π (0), can be estimated by applying the following formula: In this example, the cohort of 6,000 patients has initial probability distribution of 0.5 0.3 0.17 0.03 0 , after 1 year, the state probability distribution will be 0.4825 0.3131 0.174 0.0257 0.0046 .
Pharmaco-economic evaluation can be assessed in three categories: the cost-benefit analysis, the cost-effectiveness analysis, and the cost-utility analysis. The evaluation utilizes the predicted number of patients in each state estimated every year, the state probability distribution predicted every year, the costs of investigations and treatments, and the quality adjusted life years for the patients [12,13].
This approach differs from the one used by Rustgi et al. [14] who depends on calculating the cost-effectiveness analysis by following a cohort of patients, all starting at the same initial state till death. While in the approach proposed in this article, sampling the population and estimating the transition probability matrix to predict the counts in the future, any cohort of patients can be followed up utilizing the information gained from sampling the high-risk population.

DISCUSSION
The following discussion elucidates the agreements and comparisons between the findings in this study with the findings in the previous one high-lightening the effects of various factors on progression rate of fibrosis in NAFLD patients.
Hui et al. [15] conducted a study on 17 patients who had previous liver biopsy showing evidence of steatosis with or without the presence of necroinflammation and fibrosis. Those patients underwent second liver biopsies with a median of 6 years apart (range: 3.8-8 years). More than half of them developed progressive fibrosis compared to the initial biopsy; because that these patients suffered from steatohepatitis, although there was no significant correlation between the degree of steatohepatitis and the degree of fibrosis between the two biopsies. However, the correlation was significant between the initial stage of fibrosis and the fibrosis grade in the second biopsy. Also, the clinical and laboratory parameters were not statistically significant between the recorded values during the first and the second biopsies. The changes in these parameters also showed no significant correlation with changes in the scores of steatosis, necroinflammation, or fibrosis. There was a negative correlation, although non-significant, between the change in the score of fibrosis and each of the changes: in the BMI, plasma total cholesterol levels, plasma triglyceride levels, and glycosylated hemoglobin. During the follow-up, two patients developed type II diabetes and one developed hypertension but without progression of fibrosis, their initial biopsy revealed F0, and the second one was also F0. Another patient developed type II diabetes with evolution of the fibrosis from F0 to F2, and another 2 patients developed hypertension with advancement of fibrosis from F0 to F1.
Fassio et al. [16] conducted a study on 22 patients who had liver biopsy with evidence of NASH and found that 31.8% (7 patients = P group, progressors) had progression of liver fibrosis over a median follow-up of 4.7 years. The other group was 68.2% (15 patients = NP group, non-progressors) and did not progress over a median follow-up of 4.3 years. The rate of progression in the entire population was estimated as 0.059 fibrosis units per year (mean difference in fibrosis score divided by mean interval in years between the first and second biopsies = 0.32/5.34 = 0.059), the rate of progression in the P group was 1.85/6.59 = 0.28. There was no statistical difference as regards the clinical, biochemical, grade of steatosis, and grade of inflammation between the two groups except for the presence of obesity and higher BMI (progressor was more obese with higher BMI than the non-progressor) whether this was performed during the initial liver biopsy or the final liver biopsy. Within each group, the gradients between the final and basal results were not statistically significant as regards the clinical, biochemical, grade of steatosis, and grade of inflammation between the two groups including the BMI.
Adams et al. [7] conducted a study on 103 patients who had performed two liver biopsies with mean follow-up period of 3.2 ± 3 years (range = 0.7-21) between the first and the second biopsies. A total of 38 patients were progressors, 35 patients were stable, and 30 patients were regressors. No clinical or biochemical variables were statistically different among the progressors, stable, and regressors. The rate of fibrosis change varied from −2.05 to 1.7 stages/year and calculated as stated in the introduction. Using univariate regression model, the presence of diabetes, AST/ALT ratio, steatosis grades, and fibrosis stage were the only significant variables. By multivariate linear regression analysis and adjusting for age and BMI, only the presence of diabetes and earlier fibrosis stage were significantly associated with a higher rate of fibrosis progression. He also found no significant correlation between rate of progression and HOMA.
The findings of the present study demonstrate that HOMA2-IR has a positive and a statistically significant effect on progression of fibrosis among the different states. Running multivariate Poisson regression reveals that the main players for progression are the HOMA2-IR, LDL-chol, and systolic blood pressure explaining about 35-60% of variability in the rates of progression. However, HOMA2-IR has a negative effect that is not statistically significant on the rate of remission or regression from F1 to F0, from F2 to F1, and from F3 to F2, but it is statistically significant on the rate of remission from F2 to F0 and from F3 to F1. Poisson regression model explained that the same factors and their interactions were responsible for about 60-70% of variability in the rates of remission among the states. The high HOMA2-IR levels significantly decrease the effects of high LDL levels on the progression rate from F0 to F1 and from F3 to F4. Thus, this interaction can be a protective mechanism to slow down the progression rate of fibrosis. The low HOMA2-IR levels significantly increase the effect of low LDL levels on the remission rate from F1 to F0 and from F2 to F1. Thus, this interaction can be a protective mechanism to accelerate the remission rate of fibrosis. The rate of fibrosis decreases with the help of rigorous control of the blood level of insulin, glucose, cholesterol, and blood pressure. The high levels of systolic blood pressure significantly decrease the effect of low LDL levels on the remission rate of fibrosis from F1 to F0, from F2 to F1, and from F3 to F2. Thus, controlling the most harmful factors like hyperinsulinemia and hypercholestrolemia, even in the absence of strict control of hypertension, can still benefit repressing the fibrogenesis. Lifestyle modification, in the form of physical exercise and a low caloric diet, and controlling the risk factors greatly impact arresting the process of fibrogenensis.
The newly emerging anti-fibrotic drugs will also help physicians treat fibrogenesis. In the FLINT study conducted on 283 non-cirrhotic patients taking obeticholic acid (OCA), 25 mg daily; the improvement in the histology detected by NAFLD activity score (NAS) was two points or more with no deterioration of fibrosis, and 35% of patients taking OCA had a decrement in fibrosis score by at least one stage in comparison with 19%in the placebo arm. REGENERATE study (still in progress, with the estimated primary completion date is on September 2025 as shown on clinicaltrials.gov official site) will evaluate safety and efficacy of obeticholic acid (OCA) in NASH patients with fibrosis who are randomized to a daily dose of 25 mg, 10 mg, and placebo, with endpoints like amelioration of fibrosis by at least one stage and decaying of NASH with no deterioration of fibrosis. At 18 month of randomization, liver biopsy revealed statistically significant histological amelioration of fibrosis and decaying of NASH with no deterioration in fibrosis for both 10 and 25 mg doses. In the GOLDEN study, conducted on 274 NASH patients, 120 mg elafibranor taken daily for 52 weeks induced decaying of moderate to severe NASH in a meaningfully higher percentage of patients than placebo; these patients also showed lowering in fibrosis stage compared to non-resolving NASH patients. The RESOLVE-IT trial (last update was on 30 November 2020, as shown on clinicaltrials.gov official site, but the study is still in progress according to Guirguis et al. [24]) emerged in May 2020 had shown that 19.2% of patients, on 120 mg daily elafibranor, had NASH decay without deterioration of fibrosis compared to 14.7% in the placebo group, which was not statistically significant. Furthermore, 24.5% of patients had shown fibrosis amelioration of more than one stage compared to 22.4% in the placebo group, which was also not statistically significant. In CENTAUR trial, conducted over 289 patients taking cenicriviroc (CVC), 150 mg daily and placebo for 52 weeks, no comparative betterment in NAS between NASH group and placebo was seen; however, there was one stage or more amelioration of fibrosis with no deterioration of NASH in the group taking the CVC compared to placebo group. The AURORA trial (primary completion dates were October 2021 according to clinicaltrials.gov site while the completion date will be October 2028 according to Guirguis et al. [24]) will evaluate long-term safety and efficacy of 150 mg daily CVC for the treatment of fibrosis in NASH adult at 2 phases: the first has endpoint of at least one stage amelioration of fibrosis without deterioration of NASH at month 12, and phase 2 has endpoint that is cirrhosis, liver-related outcome as HCC, and all causes of mortality. In a small, open-label, randomized phase II trial including 72 biopsy-proven NASH patients (NAS ≥ 5 and stage 2-3 liver fibrosis) receiving 18 mg daily selonsertib for 24 weeks, there was a significant improvement in liver disease activity, fibrosis, stiffness, liver fat content, and progression to cirrhosis [25].
The distribution of the counts was Poisson distribution (mean = variance); that is to mean, these counts were equidispersed. However, all the counts showed excess zeros except for the transition from F0 to F1 where the zeros constituted 42% of the total count of this transition. Tlhaloganyang and Sakia found that the equidispersed counts data with excessive zeros can be modeled with Poisson regression, the best model to represent the data. Also, the AIC scores obtained by them after running Poisson regression, on their tested data whether simulated or real, were less than the AIC scores after running ZIP on the same datasets [26]. In this article, the predictors were normally distributed, and applying the restricted cubic spline transformation was used to better specify the functional form of these predictors. The raw predictors and the transformed predictors were highly correlated. But the condition number obtained from the transformed predictors is below 100, which is not harmful for the analysis as shown in the Results. Vatcheva et al. [27] highlighted the fact that the majority of researchers do not mention the multicollinearity diagnostics when running the regression models, discussed the causes and effects of this lack, and proposed some remedies to treat multicollinearity such as: principal component analysis, partial least squares regression, and ridge regression analysis. Akram et al. [28] used principal component ridge type estimator for the inverse Gaussian regression model. Many investigators such as, Liu [29], Kibria and Lukman [30], and Lukman et al. [31] had proposed different techniques to manage the multicollinearity problem between the predictors when running regression models. Some of them, who developed methods for Poisson regression, are Månsson and Shukur [32], Månsson et al. [33], Lukman et al. [34], Lukman et al. [35], and Qasim et al. [36]. In this paper, none of these methods were used as the Poisson model was mainly used to give preliminary vision about the effects of the high-risk factors on the transition counts. Also, it was not used for prediction, and the condition number was <100. Once the estimated counts were obtained, they were fed to the CTMC to estimate the transition rate matrix and transition probability matrix at any specified time point. Thus, physicians can follow a cohort of any patients in various states and obtain their state probability distribution at different time points.
The strength of this study is the conduction of multiple frequent repeated observations over a long period of follow-up on a large number of high-risk participants for developing NAFLD and performing a liver biopsy during each visit. Although this may be realistically infeasible during each visit, non-invasive techniques [37,38] can substitute the invasive liver biopsy. The advantage of techniques like MRI and machine learning [39], to assess the liver texture and correlate these findings with the histological findings in liver biopsy, can overcome this weakness. Liver biopsy can also be reserved in situations where noninvasive tests are inconclusive. These non-invasive tests decrease the number of liver biopsies each patient may encounter. The proposed follow-up period is too long to wait for the obtained results, which can be overcome by using adaptive clinical trials. The weakness of the study is the presence of dependency among the response variables which was not treated by the statistical analysis used in this study. A copula modeling discrete random vectors like the counts in this study can be used in future analysis. However, a copula of discrete vectors is not fully identifiable and thus causes serious inconsistencies [40], especially when modeling nine variables like the variables used in this study.

CONCLUSION
In the present study, running Poisson regression model is used to obtain the expected counts of transition among states. These counts are used as input into the homogenous CTMC. Using this CTMC, the transition rate matrix is estimated, and thus, the probability of progression of participants from specific state to another one at specific time point can be estimated by exponentiation of this rate matrix. This probability matrix at any specific time point multiplied by the initial probability distribution of a cohort of patients can be used to predict the number of the participants in each state later on at different time points. This predicted number of participants helps health policymakers and insurance managers allocate the human and financial resources to investigate and treat the high-risk patients for developing NAFLD. The Poisson regression model relates these high-risk covariates to the transition rates among states. Also, this approach can be used in the clinical trials to assess the effectiveness of the newly emerging anti-fibrotic drugs. The epidemiologists can utilize this methodology to estimate the effect of risk factors on the incidence rates of progression and remission among the different states of liver fibrosis due to NAFLD.
This hypothetical study is coded by stata-14 and is published in code ocean site with the following URL: https://codeocean. com/capsule/4752445/tree/v3.
The code to estimate the Q transition rate matrix for the observed transition counts using continuous-time Markov chains is published in the code Ocean site with following URL: https:// codeocean.com/capsule/6377472/tree/v2.
The code for solving the forward Kolmogorov equations using the estimated Q rate matrix is published in the code Ocean site with following URL: https://codeocean.com/capsule/ 7258626/tree/v1.
The dataset is present on IEEE Data Port site with the following URL: https://ieee-dataport.org/documents/fibrosis-nfld#files, with the following doi: 10.21227/dr5j-gs46.
A medical appendix briefly clarifies the stages of fibrosis due to NAFLD. See also the presentation (in the Supplementary Materials).

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

AUTHOR CONTRIBUTIONS
IA carried out the conceptualization by formulating the goals, aims of the research article, formal analysis by applying the statistical, mathematical and computational techniques to synthesize and analyze the hypothetical data, the methodology by creating the model, software programming and implementation, supervision, writing, drafting, editing, preparation, and creation of the presenting work.