Data-driven analysis and prediction of dynamic postprandial metabolic response to multiple dietary challenges using dynamic mode decomposition

Motivation In the field of precision nutrition, predicting metabolic response to diet and identifying groups of differential responders are two highly desirable steps toward developing tailored dietary strategies. However, data analysis tools are currently lacking, especially for complex settings such as crossover studies with repeated measures. Current methods of analysis often rely on matrix or tensor decompositions, which are well suited for identifying differential responders but lacking in predictive power, or on dynamical systems modeling, which may be used for prediction but typically requires detailed mechanistic knowledge of the system under study. To remedy these shortcomings, we explored dynamic mode decomposition (DMD), which is a recent, data-driven method for deriving low-rank linear dynamical systems from high dimensional data. Combining the two recent developments “parametric DMD” (pDMD) and “DMD with control” (DMDc) enabled us to (i) integrate multiple dietary challenges, (ii) predict the dynamic response in all measured metabolites to new diets from only the metabolite baseline and dietary input, and (iii) identify inter-individual metabolic differences, i.e., metabotypes. To our knowledge, this is the first time DMD has been applied to analyze time-resolved metabolomics data. Results We demonstrate the potential of pDMDc in a crossover study setting. We could predict the metabolite response to unseen dietary exposures on both measured (R2 = 0.40) and simulated data of increasing size (Rmax2= 0.65), as well as recover clusters of dynamic metabolite responses. We conclude that this method has potential for applications in personalized nutrition and could be useful in guiding metabolite response to target levels. Availability and implementation The measured data analyzed in this study can be provided upon reasonable request. The simulated data along with a MATLAB implementation of pDMDc is available at https://github.com/FraunhoferChalmersCentre/pDMDc.


Introduction
Diet is one of the main modifiable lifestyle factors that contribute to health (1).It is known that individuals exhibit large differences in response to diet and nutritional requirements, which motivates the research in precision nutrition (PN) for improved population health (2)(3)(4)(5).PN can be defined as providing the right diet to the right person at the right time, i.e., optimizing the dietary intake for individual needs, which requires prediction of individual responses to diet.
Prediction of metabolite response is still considered a difficult problem, since no mechanistic model as of yet covers the entire human metabolome, although important progress has been made in recent years.In a landmark study by Zeevi et al. (6), individual postprandial glycemic responses to various foods were successfully predicted from blood parameters, dietary habits, anthropometrics, physical activity, and gut microbiota highlighting the potential for personalized dietary recommendations to lower glycemic response.Later, in the PREDICT 1 study, meal composition, habitual diet, meal context, anthropometry, genetics, microbiome, clinical and biochemical parameters were used to predict individual postprandial responses in triglycerides, glucose, and insulin successfully to responses of food intake (7).Notably, both of these studies targeted only a small number of metabolites (one and three respectively) which are determinants for cardiometabolic diseases.However, many more metabolites have been associated with health and disease outcomes, and it will become a desirable trait in PN to guide a larger part of the metabolite levels to optimize response to, e.g., dietary exposures (8).Wang et al. recently used dietary data and microbiome to predict metabolome profiles at a static point in time using deep learning (9,10).However, deep neural network architectures are often challenging to interpret, which is desirable when connecting dietary contents and metabolite response.Furthermore, prediction of metabolite response over time will provide more information than single time points.However, such high-dimensional multivariate time series forecasting has not yet been explored in PN (11).
Rather than focusing on individual metabolite responses, an alternative approach within PN is to identify groups of individuals with similar latent metabolic phenotypes (i.e., metabotypes) and tailor dietary advice for the group (1,12,13).Identifying metabotypes relies heavily on dimensionality reduction to capture essential trends in the data that can subsequently be clustered.This has previously been done in matrix (single point in time) and multiway/tensor metabolomics data (time series) using matrix and tensor decomposition methods respectively, such as principal component analysis (PCA) and CANDECOMP/PARAFAC (CP) (14,15).Tensor decomposition methods like CP are useful for interpreting time series data in, e.g., metabolomics but are descriptive methods and not inherently predictive.Furthermore, CP which arguably is the more interpretable tensor decomposition (compared to, e.g., Tucker decomposition), has a multi-linear structure that does not fit all tensor data, leaving unexplained variance in the data.
Addressing both the problem of prediction and dimensionality reduction, dynamical mode decomposition (DMD) (16) has recently emerged as a promising tool for analysis of large-scale dynamical data.However, to the best of our knowledge, the usefulness of DMD has not yet been investigated in the analysis of time-resolved metabolite data.DMD works by identifying a low-rank linear dynamical system (LDS, a linear state space model) directly from data.The low-rank LDS captures the "essential" dynamics underlying the full system under study, and mapping between the low-rank system and the full system can be performed using a linear operator.Thus, DMD naturally incorporates both prediction (in the form of dynamical modeling) and dimensionality reduction (in the reduced rank of the model), and thus holds the potential to become a valuable tool within PN.Further, LDSs are well-studied, interpretable, and have been used extensively to model and forecast phenomena in a wide variety of research fields (17).Moreover, these systems come with a range of analytical perks such as stability and identifiability analysis, automatic control, and analytical state solutions (18).
Since its introduction, several variants of DMD have been proposed (19)(20)(21)(22)(23). Notably, parametric DMD (pDMD) allows several perturbations of the same system to be used to infer the parameters of a single DMD model (24).This makes it well suited to infer the metabolic system by integrating data from, e.g., different intervention diets.In addition, another development denoted DMD with control (DMDc) (22) allows incorporation of system input to the identified LDS, which is well aligned with, e.g., dietary interventions to separate the metabolic regulation from the dietary exposures.
Our aim was therefore to explore the use of DMD for the analysis of time-resolved metabolomics tensor data.Specifically, we employ a combination of pDMD with DMDc (denoted pDMDc) to predict the metabolite response of a given individual to new diets after having trained on several others, and to identify potential metabotypes from LDS trajectories.As a realistic setting for this method development, we used measured metabolomic data from a dietary intervention crossover study, with variability in four dimensions: metabolite, time, individual, and diet.The methodology was further evaluated using identically structured synthetic data from a large human metabolome model (25), containing two clusters of individuals (healthy and diabetic) used as ground truth for metabotyping, and responses from simulated dietary interventions as ground truth for prediction.The identification of potential metabotypes was performed by clustering latent dynamic states derived from pDMDc using measured and simulated data, and compared to clustering scores derived using CP.  1.
Baseline clinical measures along with anthropometric measures were recorded, including alanine aminotransferase (ALAT), aspartate transaminase (AST), gamma-glutamyl transferase (GGT), cholesterol, low-density lipoprotein (LDL), creatinine, thyroid stimulating hormone (TSH), and body mass index (BMI).On each test occasion, blood samples for metabolomics analysis were taken 8 times at one-hour intervals, including a baseline sample just before the meal was consumed.Metabolomics analysis was performed using GC-MS and 79 targeted metabolites were measured, resulting in 79 metabolite trajectories with 8 time points for three different diets for each individual.The main categories of metabolites contained in the data were: amino acids (n = 35), carboxyl acids (n = 6), lipids (n = 8), and carbohydrates (n = 16).The data can be viewed as a fourth-order tensor (four-way array)  ∈ × × ×  M T I D with M =79 metabolites, T =8 time points, I=17 individuals and D=3 diets.A graphical overview of the study design and experimental data is presented in Figure 1.The baseline measurement, x m t i d , , , =0 , was subtracted from each time series to focus the analysis on the postprandial dynamics in the data from the study intervention.For proper values on evaluation metric (R 2 ) we standardized the response per diet to unit variance.

Simulated data from a virtual metabolic human dynamic model
Data was simulated for two purposes; (i) to evaluate prediction performance of response to unseen dietary exposures using pDMDc, and (ii) to evaluate metabotyping with ground truth clusters using pDMDc and CP.We used a virtual metabolic human dynamic model for pathological analysis (25) to generate simulated data.The model consists of 202 ordinary differential equations modelling the human metabolome using 1,140 kinetic parameters.Inputs to the model are amounts and uptake rates of glucose and triglycerides, and the outputs are dynamic postprandial responses of 202 metabolites.
For the dietary response prediction study, we generated a dataset consisting of 17 healthy simulated individuals (same number as the measured data in Section 2.1.1)and 90 postprandial responses to different dietary exposures.For the metabotyping study, data from 50 healthy and 50 diabetic individuals were generated as postprandial responses to 3 meal interventions.The meals were generated by sampling the glucose and triglyceride content and their uptake rates from a normal distribution, as detailed in Table 2, "dietary parameters." Differences in individual metabolism were modelled using a selection of kinetic parameters impacting the diabetic characteristic, as detailed in Kurata (25), with mean and standard deviations set to produce typical responses from healthy and diabetic individuals (Table 2, "individual parameters").Initial values for the simulations were produced by running the model with the default food input (mean values of the dietary parameters in Table 2) for 10 h.
For each simulated dataset, the model simulated metabolite dynamics for 10 h, producing 60 time points sampled at 10-min intervals.Logarithmic down-sampling resulting in eight time points was done to make the simulated data more similar to the measured data and to save computing time when performing DMD.Metabolite concentrations with a standard deviation less than 10 −3 mM during the course of the simulation were excluded, leaving 130 dynamic Study design graphic of the three-armed crossover dietary interventional study.

Linear dynamical systems 2.2.1 Discrete linear dynamical systems
In this work, we describe the dynamics of metabolite responses using discrete time LDSs, defined by a coupled system of linear difference equations: is the system matrix representing the best-fit (smallest Frobenius norm) mapping between metabolite time points, z t l ∈  is a vector of l system inputs, is the best-fit mapping from the input to the metabolite dynamics.We also assume that the measured metabolites can be projected linearly (via the mapping  U M S ∈ ×  ) to a smaller set of latent dynamical state variables  x t S ∈  of size S (potentially simplifying system analysis), representing the "essential" dynamics of the system, as formalized in Equations (2a, 2b) ∈ ×  is the mapping from the input to the metabolite dynamics, and ˆM t x ∈  is the approximation of the data consisting of M metabolites.Given these model structures, it remains to infer the dimensionality and parameters of Equations (1, 2a, 2b) from the available data, which we do using DMD.

Conceptual description of the dynamic model
DMD is a method that learns a linear kinetic model structure, along with its parameters, from dynamical data, without any prior knowledge of the underlying system.As integral part, DMD also provides dimension reduction of the data, which in turn reduces the complexity of the learned model.In practice, this means that many correlated measured signals can be represented by a single model state, similar to PCA.
Data that measure the response of a system to some perturbation are suitable for the application of DMD.This fits well with the setting of this study, where the data consist of postprandial responses, measured as time-resolved omics features.The DMD model, representing the metabolic system, is learned using least squares regression, which also has the advantage that optimization of model parameters can be done analytically.This avoids the risk of finding local optima, which is a problem for many machine learning methods (27).Furthermore, the least squares solution can be calculated using the singular value decomposition (SVD), which also allows for dimension reduction and therefore reduction in model complexity.
Additionally, DMD can be extended to include a known input to the model (dynamic mode decomposition with control, DMDc).In this work, the model structure expressed in Equations (2a, 2b) is learned using DMDc, with dietary intake as input and measured postprandial response, in the form of high dimensional metabolomics time trajectories, as output.Furthermore, to include information about responses to different meals, an additional extension called parametric DMD (pDMD) can be combined with DMDc, resulting in the proposed pDMDc method.This development enables prediction of postprandial response to new diets.Since the dimensionality reduction property of DMD still holds for pDMDc, it also allows for identifying the differences in response between individuals in large groups of omics features.
A mathematical description of DMDc and pDMDc can be found in Sections 2.3 and 2.4, respectively.To give an overview of our proposed data analysis workflow, we summarize the two different applications, i.e., predicting the postprandial metabolite response and the identification of metabotypes, in Figure 2.

Dynamic mode decomposition with control
In the context of the dynamic metabolite measurements described in Section 2.1, vectors of metabolite samples , , , are stored column-wise in the matrices (commonly called snapshot matrices in the DMD literature), ( ) : Thus, one snapshot matrix contains the first T −1 measurements and the other one contains the last T −1 measurements.Note that here we assume that the metabolite measurements stem from a single dietary provocation to the metabolic system.
Further, we assume that the measurements in each time point have been generated from a linear time-invariant system (Equation 1), where each time point is a linear function of the previous measurement and a previous control/input signal.Using the example data, the input z t could represent, e.g., nutrient composition in a dietary intervention, and the data x t represent the postprandial metabolite responses to the intervention.To estimate the matrices A and B, we first construct the control input snapshot matrix, in Equation 5.
( ) Overview of the two applications for which we use our method.(A) Identification of a discrete linear dynamical system (LDS) using parametric DMD with control (pDMDc) allows for prediction of postprandial response of a new diet having trained on several others (Section 2.5).(B) Using pDMDc with a shared output mapping  U tot ( ) that links individuals in the population to the same LDS framework, we cluster latent state trajectories  x t i d , , to identify metabotypes (Section 2.6).These definitions allow us to formulate an equation analogous to Equation (1) but in full matrix form: Here, ′ X , X and Z are already known, so in order to solve for A and B, we restate Equation 6as We model the dietary input (represented by the nutritional composition of the intervention diet) as an impulse of size l to the metabolic system at the first time point, which implies that Z is set to zero everywhere apart from the first time point (u 1 ).Accordingly, the matrix Σ can be written as We now seek the best-fit solution, in the Frobenius sense, of the operator G , representing a column-wise concatenation of A and B. This is achieved using the Moore-Penrose inverse (denoted † ) via the SVD of Ω Σ = U W  .As commonly done in DMD, we can also choose to truncate the SVD to ) components (e.g., to reduce overfitting and sensitivity to noise), resulting in the reduced matrices ( ) . Applying the pseudo-inverse and truncated SVD to solve Equation 7gives us the matrix G as In the text below, will be used to denote rank-reduced versions of A and B. The resulting rank-reduced system retains the same size as the original system (Equation 1) but is less prone to overfitting to the given data.We now retrieve the approximations Â and B of the two linear operators A and B by dividing G into two parts: Here ( ) represent the bases for the metabolomics data and the dietary input, respectively, as where ˆt x denotes the approximation of the metabolite concentration x t .
At this stage, the system still represents the dynamics of each metabolite with a separate state.However, for simplified analysis we also aim to reduce the number of states of our LDS to a latent space representation with S ≤ M states (cf.Equations (2a, 2b)).The choice of S and V is dependent on the data and can be set equal to each other.To do this, we start with the SVD ′ , which we use to map the system onto the latent state space.The resulting system matrix, input matrix, and state vector for the reduced order LDS become  .This allows us to form the reduced LDS as Here, the initial state  x 1 is derived directly from data as 

Parametric dynamic mode decomposition with control
For the remainder of the manuscript, we will use an index-based notation.Using the data as an example, .Finally, we use , .Parametric DMD with control (pDMDc) can make use of higher order tensor data, such as cross-over interventions, where 3-way data ( Subsequently, we proceed with DMDc, completely analogous to Equations 9-12a, 12b, 12c but substituting ′ X (Equation 4) with X i, * ′ (Equation 15) and Ω (Equation 8) for Ω i, * (Equation 16), to obtain the matrices  A i ,  B i and  U i d , .For a detailed derivation, see the Supplementary material, Section 1.This gives the following system, analogous to Equations 13a, 13b: , , , , , Here the dimensions of the system are the same as for Equations 13a, 13b and the indices t, i, and d represent time, individual, and diet, respectively.The initial state of each individual and diet  =  .When using Equations 17a, 17b to predict the response to a new diet, we avoid projecting the states onto a lower-dimensional subspace (corresponding to setting S = M during the SVD of the snapshot matrix X i, * ′ and thus effectively omitting  U i d , from Equations 17a, 17b) but still rank-reducing the system (see Supplementary material Equation S8).This is done to reduce prediction errors since the interpretability of the model is not the focus.
To identify metabotypes, we take an alternative approach to deriving the LDS, using a mapping between the full and latent system that is shared by all individuals, thereby making state trajectories comparable between individuals.We start by forming the matrix by column-wise concatenation of data for all individuals, time points, and diets.We then perform an SVD of X tot ( )  and truncate the resulting matrices using . Now  U tot ( ) defines the mapping between the full system and a latent, reduced system, shared by all individuals.Next, we use  U tot ( ) to project the snapshot matrices X i, * and X i, * ′ onto this latent space, forming two new matrices according to Similar to Equation 16, we then form a matrix  Ω i, * according to Finally, using  X i, * ′ and  Ω i, * from Equations 18a, 18b, 19, we again proceed with DMDc according to Equations 9-12a, 12b, 12c, but this time substituting the snapshot matrix ′ X (Equation 4) with  X i, * ′ (Equations 18a, 18b) and Ω (Equation 8) with  Ω i, * (Equation 19), to obtain the matrices  A i and  B i .For a detailed derivation, see the Supplementary Material, Section 2. Together with the matrix  U tot ( ) , this allows us to write the system as, in Equations 20a and 20b , , , , ˆtot Here, the three diets are given distinct initial states  and initial inputs z d 1, to the same individual LDS given by the matrix  A i and input matrix  B i .

Predicting postprandial metabolite response to unseen diets
Prediction of response to unseen diets with pDMDc was done using measured data (Section 2.1.1)and simulated data (Section 2.1.2),where evaluation of performance was measured using R 2 on test sets.
Pooling of the measured data was done using the responses to the three dietary exposures for all individuals (n = 17), thus each response per individual is considered one "observation."Using the measured data, a training, validation, and test split of 60, 20, and 20% of the observation were used, respectively.To estimate the R 2 on the entire dataset, the order of observations in the entire dataset was randomly permuted prior to splitting, training, and evaluation.This resampling without replacement was performed 100 times and the average R 2 from all iterations was calculated as the final one.
To investigate if the inclusion of further dietary exposures hold the potential to improve the predictions, initially the same pooling of the data (Section 2.1.2) from 17 simulated individuals was performed.However, instead of using the same split as for the measured data we assigned a large test set of the response to 50 diets on 17 individuals and gradually increased the number of responses to diets in the training and validation set from 3 up to 40 diets.
Model complexity was chosen using the residuals of the training and validation data.When the root mean squared error (RMSE) of the validation data deviated more than 30% from the RMSE on the training data, the training stopped and the model that achieved the lowest RMSE was chosen.Model performance evaluation was measured using R 2 -values computed according to Equation 21. 1 where vec  ( ) represents the vectorized tensor  (entire test set) and  (entire predicted test set) denotes the predicted tensor while  denotes the average of all the values N in the tensor.

Metabotyping
Metabotyping was performed using both pDMDc and CP (representing a state-of-the-art decomposition method) on the measured data from the intervention study described in Section 2.1.1 and on the simulated data described in Section 2.1.2.To validate the use of pDMDc for metabotyping, we aimed to identify two previously published metabotypes (15) in the measured data.Using simulated data, metabotyping was performed to identify ground truth clusters of diabetic and healthy individuals as described in Section 2.1.2.
To identify metabotypes using pDMDc, we compared the resulting latent state trajectories from pDMDc (Section 2.4) between individuals by taking the pairwise cosine similarities c s i j d , , , and arranging them in a square matrix with I=17 rows and columns as in Equation 22:

∈ ×
 per state and diet.We proceeded by clustering the matrix using agglomerative clustering to identify groups of individuals with similar metabolite dynamics per state.In the case of measured and simulated data, the number of clusters to search for was known.
Model complexity in pDMDc for metabotyping, i.e., the choice of S and V (for simplicity we choose S V = ), was assessed by identifying the inflection point in the scree plot of the singular values derived from the SVD X using correlation.
1: ,1: , , 1 1 Clusters of metabolites in the covariance matrix were identified using agglomerative clustering with "Farthest distance" as method and one minus the sample correlation between points as distance metric in MATLAB (Statistics and Machine Learning Toolbox, version 7.10.0,R2022a, The MathWorks, Inc., Natick, Massachusetts, US).
The potential metabotype clusters identified using pDMDc were compared to clusters derived from the scores of the unconstrained CP (30), representing a state-of-the-art tensor decomposition method (Equation 25).
Here  M T I D .The factors are multiplied using the outer product ⊗ and form a tensor with four axes/modes and the F components are summed to reconstruct the tensor  .This is similar to PCA where the factors of different components are constrained to be orthogonal to each other and the tensor is constrained to only two axes, i.e., a matrix.We refer to p as individual scores, and the other factors as metabolite, time, and diet loadings (r, q, and h respectively).The data for each metabolite was scaled to unit standard deviation (31).The CP models were derived using the MATLAB N-way toolbox (32) and model complexity was chosen by selecting the model that explained the most variance without exhibiting two-factor degeneracies.
Using CP, clusters were identified using k-means on the individual scores p f and compared to clusters identified using pDMDc.Agglomerative clustering was performed on the similarity matrix derived using pDMDc since this clustering well on matrices similar to covariance matrices while the k-means was chosen when clustering CP scores, as it is a common method for clustering matrices with no particular structure (27).To assess the biological relevance of the clusters, we used ANOVA to find associations between clusters and clinical baseline and anthropometric measures.For strong associations (p < 0.05) with a plausible biological relationship between metabolites, clusters, and baseline measures, we infer that the clusters may correspond to potential metabotypes.
To compare variance captured using pDMDc and CP we calculated the correlation between orthonormal basis vectors  U M s tot 1: , ( ) and the metabolite loadings r f .Furthermore, we calculated correlation between the latent state trajectories  x T i d 1: , , and the time loadings q f to assess the dynamic behavior captured in each component and state.The trajectories and time loadings were compared to mean metabolite trajectories of grouped metabolites from clustering the data covariance matrix using correlation (Equation 24).

Predicting postprandial metabolite response to unseen diets
Using the baseline metabolite measurement and the dietary information (macronutrient amounts) we predicted the response of unseen diets in the measured data.Since the participants (n = 17) were only exposed to three meal challenges in the study, we pooled data from all individuals and considered each individual with their metabolite response to the three challenges as three "observations." Using the pooled data we were able to predict the test sets (10 observations) with 40% explained variance (R 2 = 0.4) (averaged over all resampling iterations).Figure 3A shows a predicted response to a unseen dietary exposure in six of the 79 metabolites, whereas Figure 3B shows the prediction and data of the entire test set as a scatter plot.
Using simulated data from 17 individuals (pooled data as in the measured data), we observed that increasing the number of exposures to diets in the training set increased R 2 on the large test set (170 examples and 50 diets) shown in Figure 4.When increasing the number of diets successively from 3 to 40, R 2 -values displayed a corresponding increase from around 0.38 to 0.65 (Figure 4).As input to the pDMDc model, we used the simulated metabolite baseline and the dietary information (glucose and triglyceride content and their delay coefficients explained in Section 2.1.2).

Metabotyping
Metabotyping was performed in measured data (Section2.1.1)to identify already published metabotypes and on simulated data (Section 2.1.2) to identify ground truth clusters of healthy and diabetic individuals.
The scree plot of the singular values (used to choose model complexity) resulted in an inflection point at four SVD components (data not shown), i.e., four latent pDMDc states.The correlation between averaged metabolite clusters and latent states was calculated to identify what metabolite category the states represented.The first state trajectories captured metabolite trajectories peaking at 2 h post ingestion (ρ = 0.88), the second captured slower dynamics, peaking at >7 h (ρ = 0.97), the third state captured oscillatory dynamics (ρ = 0.85), and the fourth state captured intermediate dynamics (ρ = 0.93), peaking at 3 h (Figures 5A,B).Using CP, only three components could be extracted that showed both interpretable dynamics and biologically relevant clusters in scores (Figure 5C).CP models using more than 3 components had degenerate solutions in accordance with previous findings from this data set (15).The CP time loadings reflected pDMDc states one (ρ = 0.90) and two (ρ = 0.99), but did not capture dynamics of state three (ρ = −0.06)and four (ρ = −0.54).Regularization of the metabolite mode was attempted, but clustering in scores revealed no relevant biological connection to baseline clinical parameters (data not shown).
Figure 6 gives a visual overview of the metabolite contribution to each state in pDMDc and components in CP (Figure 5), via column vectors of the shared output matrix  U tot ( ) (Section 2.4) and metabolite loading vectors (Section 2.5).The figure shows that large groups of metabolites (Figure 6) coincide with dynamic profiles (Figure 5A) in each given latent state and component of the models.The CP metabolite loadings (Figure 6C) coincide well in the first (ρ = 0.89) and second (ρ = 0.78) component with the first and second pDMDc state but states three and four correlated more poorly in the third component (ρ = −0.41 and ρ = −0.44,respectively).For the metabolite data used in this study, the first state was represented predominantly by amino acids, which had an early absorption peak at around 2 h (Figure 6, first column).The second state was represented predominantly by lipids   Frontiers in Nutrition 12 frontiersin.orga 100% match could be found using two-components but explaining less variance of the data (15).ANOVA analysis revealed that the clustering was significantly associated with baseline creatinine levels (p = 0.007).Finally, the amino acids that contributed the most to the first state showed differences in response between clusters (Figure 7C).When applying pDMDc and CP for metabotyping on simulated data we were able to identify simulated diabetic and healthy individuals (Figure 8).Using pDMDc, the scree plot showed inflection point at four states (data not shown) and diabetic clusters were identified in states two and three but were separated most clearly in state four (Figure 8A).Although ground truth clusters could also be found using CP (Figure 8B), more than three components could not be modelled without factor degeneracy even though clustering of the covariance matrix revealed dynamics which pDMDc could model but not CP (data not shown).The most dominating metabolites in the fourth state displayed clear cluster separation (Figure 8C).

Discussion
In this work, we explored combining two variants of DMD (DMD with control and parametric DMD) for the purpose of prediction and classification of metabolomic data from multiple dietary challenges.We demonstrated the utility of the method in two use cases: (i) predicting measured and simulated and postprandial response to new diets based on data from prior dietary interventions (ii) identifying metabotypes in metabolomic data from a dietary intervention crossover study and simulated data.We show that it is possible to predict responses to new dietary exposures using pooled measured and simulated data.Additionally, we showed that prediction performance is increased when adding more diets to the data available for training the DMD model.Furthermore, we showed that metabotyping can be performed using the same pDMDc model, and that the clustering results are comparable to those produced by the state-of-the-art method CP.To our knowledge, this study represents the first application of DMD to metabolite data.
Using pDMDc to predict the simulated data set from the virtual human metabolic model, we showed that the predicted variance could be improved significantly by adding more diets (Figure 4).This was done since no more than three dietary exposures were available in the measured data.The result indicates that using more dietary exposures from dietary intervention studies could improve predictive performance, and that the pDMDc algorithm could be useful for predicting large parts of the measurable metabolism with high precision.However, the efficacy needs to be confirmed in large-scale multi-diet intervention studies.We showed that pDMDc is interpretable in terms of what metabolite dynamics each state represents (Figures 5, 6), in contrast to deep learning methods that often lack an explicit representation of metabolite dynamics, often making interpretation difficult (33).Furthermore, pDMDc is linear and built on well-understood mathematics used in many fields of physics and mathematics, including linear control theory.Thus, pDMDc provides the potential to approach PN by estimating the optimal dietary input given a target metabolite trajectory (18).This could potentially help find the best personalized diet in a population with differential responders.To do this, a multi-diet intervention would be needed for the model to learn the individual responses in a large set of metabolites to varied diets.Thereafter, a priori known target levels of important biomarkers would be set and used together with the trained model to calculate what dietary input is required to meet those levels.This is analogous to existing methodology in automatic control, which is well understood and widely applied to LDSs.While this is a relatively easy problem in cases where we have perfect knowledge of the system under study, it should be noted that pDMDc only provides an approximation to the metabolic system based on a limited number of postprandial responses.It is therefore very important that the dietary challenges are as varied as possible, to capture as much of the systems capabilities as possible.Other machine learning methods, like random forest and network models, are not as well studied in the context of automatic control, and LDSs are preferred when applicable (18,34).Additionally, the proposed pDMDc methodology uses differences in model states (roughly corresponding to groups of metabolites) to identify metabotypes.This means that it naturally highlights metabolites that could potentially be used as biomarkers for metabotyping, as well as giving suitable time points for measuring these metabolites after a dietary provocation.
Metabotypes were successfully identified with pDMDc in measured and simulated data by clustering latent state trajectories derived using a shared output map  U tot ( ) (Equations 18a, 18b).
Imposing the shared output map makes the individual state trajectories comparable, enabling clustering of individuals which to our knowledge has not been done before using this methodology.The high correlations (Section 3.2) between trajectories in data (Figure 5A), latent states (Figure 5B), and CP time loadings (Figure 5C) show that pDMDc can be an interpretable model in terms of visually inspecting what each latent state represent.This is further supported in that metabolite Metabotype clusters of the cosine similarity scores overlapped well with clusters derived from CP in already published metabotypes (15).However, using simulated data ground truth clusters (healthy and diabetics) were identified with highest classification accuracy in the fourth pDMDc state, while they were identified in all three CP components.This indicates that CP may be superior to pDMDc when it comes to clustering.Furthermore, CP has the advantage of enabling an overview of the data in one plot (subplots of factors per component (35)).In contrast, latent pDMDc states must be clustered per state and per dietary exposure and a full overview of the compressed data using the model is not trivial to produce.However, unconstrained CP can degenerate in spite of not all dynamics from the data being accounted for in the CP model.This problem is not present using pDMDc since model complexity depends on the SVD which does not degenerate due to orthogonal components.Although divergent predictions can occur in DMD model due to instability of the LDS, this can be mended by scaling eigenvalues (20).To avoid degeneracy problems, constraints on the CP model can be imposed, such as orthogonality constraints on the metabolite mode making it more similar to the orthonormal basis  U tot ( ) in pDMDc.However, in the present study we focused on comparing our method to unconstrained CP, as it is unique up to permutation and scaling ambiguities, which is not the case when imposing constraints (31,35).When metabotyping performed on static markers and related to disease outcomes, which is often the case (11), it is far from certain that individuals within the same metabotype will respond similarly to targeted food.Conversely, identifying differential responders to meals prior to relating them to outcomes, as is done using pDMDc, may increase the likelihood that the individuals within each identified group respond similarly to a tailored diet.Indeed, the more diets the individuals are exposed to, the more likely it is that the groups of differential responders represent true metabotypes, since more information of their metabolic system is uncovered.An added advantage of pDMDc is that it allows clustering of differential responders and prediction of response to new diets based on the same model, contrary to purely predictive or descriptive models like random forests and CP.
Limitations include that to our knowledge there is currently no well-established method for forecasting multivariate metabolite time series using baseline values and dietary information.Consequently, no comparison to the state-of-the-art was possible in the prediction case.Additionally, the dietary inputs were macronutrients, Clustering of metabolic response to diet using pDMDc and CP to infer metabotypes (red and blue lines and dots).

Conclusion
We have developed a method for analysis of time-resolved metabolomics data, based on a combination of parametric DMD and DMD with control for analyzing time-resolved omics tensor data.To the best of our knowledge, this is the first use of DMD as a tool for clustering and prediction in metabolomics.The method was applied to measured and simulated data to predict metabolite responses of unseen dietary exposures using the metabolite baseline and the dietary information.Using simulated data, we showed that metabolite responses to dietary provocation could be predicted with high accuracy.Additionally, the pDMDc model was used to infer metabotypes from experimental and simulated data by clustering LDS state trajectories, resulting in agreement with previously published outcomes, although CP was shown to be a stronger metabotyping tool.The method thus has strong potential for PN applications as it learns the relationship between dynamic postprandial response and dietary exposures.Furthermore, it has the potential to provide optimal tailored diets, given target levels in biomarkers of interest.However, this potential still must be demonstrated using experimental data from large scale studies using diverse intervention diets.Even though the pDMDc methodology presented in this article was developed and applied in the context of personalized nutrition and multiple dietary exposures, it should be noted that prediction and identification of differential responders in other time-resolved omics or high dimensional biological data may also be suitable use cases.Future directions of development may include refining the estimation of the input dynamics such that it can provide a more realistic impact on the dynamics and investigating optimal dietary input for a priori known metabolically healthy target levels.Finally, for improved prediction performance and accuracy of pDMDc-tailored diets, a deeper investigation of which type of nutritional information that should be used as input is needed.
the column vector of the metabolite postprandial response at the t:th time point, i:th individual, and d:th diet.We use the colon notation (:) to indicate that a range of time points are selected resulting in the matrix x T i d M T 1: , , ∈ ×  and the * to indicate the concatenation of responses of D diets for the i:th individual as X i state trajectories, the indices i and j represent distinct individuals producing a square similarity matrix c s I data for each metabolite averaged over all individuals and diets M T X × ∈  (Equations 23, 24).The dominant dynamical metabolite profiles were compared with latent state trajectories  x s T i d , : , , 1

and hD∈
 denote factors representing the modes (metabolites, time points, individuals, and diets respectively) of the tensor  ∈ × × ×

(
Figure 6, second column), with a late peak after 7 h (Figure5, second column).The third state was represented predominantly by carbohydrates (Figure6A, third column), with an oscillatory dynamic behavior.The fourth state was represented by a mix of carbohydrates, carboxyl acids, and amino acids among others (Figure6A, fourth column), with a fast dynamic behavior (peak around 3 h).The state trajectories  x i d , obtained from the 4-state pDMDc model were clustered state by state using agglomerative clustering on the

FIGURE 3 (
FIGURE 3 (A) Dynamic metabolite trajectories for training (gray) and holdout test (red) observations, exemplified for 6 out of 79 metabolites.Here dots are data and lines are model prediction (red) or reconstruction (grey).(B) Entire test data and prediction of test data as scatter plot for one of the crossvalidation iterations.The line represents the perfect match between data and predictions.

FIGURE 4
FIGURE 4 Prediction of responses to simulated diets using pooled individuals as in Figure 3. (A) Prediction metric R2 of a large test set with increasing number of diets in the training set (5 iterations of scrambling the examples prior to splitting training and validation).(B) Prediction of test example (red) and training examples (grey) shown in six metabolite (out of 130 in total) responses.Here dots are data and lines are model prediction (red) or reconstruction (grey).The predictions are made using 40 diets in the training set.The subscripts GI,L, AdT, and MS stand for metabolites modelled in gastrointestinal tract, liver, adipose tissue, and skeletal muscle, respectively.

FIGURE 5 (
FIGURE 5 (A) Average measurement trajectory per metabolite (grey lines) and mean measurement trajectory per dynamical class (red line) obtained from clustering of the covariance matrix of the averaged data per metabolite.(B) Individual state trajectories for the pickled herring diet (grey lines) and median of individual trajectories per state (red line).(C) CP time loadings (q from Equation23) for the 3-component CP model.

FIGURE 6 (
FIGURE 6 (A) Column vectors of  U tot ( ) color-coded by metabolite category, describing the contribution of metabolites (similar to PCA loadings) to the observed states.(B) CP metabolite loadings (r from Equation25) color-coded by metabolite category, describing the metabolite contribution to each CP component.

FIGURE 7
FIGURE 7 (A) The individual state trajectories of the first state in response to the meat diet, using four latent states.(B) K-means clustering of CP scores.(C) Raw data of amino acids contributing most strongly to the first column vector of  U tot ( ), color-coded according to the clustering.10.3389/fnut.2023.1304540Frontiers in Nutrition frontiersin.orgrepresenting a highly simplified encoding of the nutritional content of the meals.More informative encodings, e.g., included food items and type of food preparation, might improve prediction and clustering results.Furthermore, the measured dataset only included 17 individuals each with 3 dietary challenges, hampering generalization of results from this part of the study.Finally, no ground truth metabotype clusters were available using the measured data, yielding a proxy validation by identifying previously published clusters.

FIGURE 8
FIGURE 8 Clustering of state trajectories and CP scores to identify the ground truth simulated diabetic (blue) and healthy (red) patients.(A) Individual DMD state trajectories of the fourth state using four latent states.(B) CP scores clustered using the three-component model.(C) Raw simulated data of metabolites with the strongest contributors to the fourth column vector of  U tot ( ) , color-coded according to the clustering.The subscripts p, h, l, and GI stand for metabolites modelled in plasma, heart, liver, and gastrointestinal tract, respectively.

TABLE 1
Macronutrients of the intervention diets.

TABLE 2
Individual parameters: description of parameter distributions for simulated postprandial responses of diabetic and healthy individuals, drawn from normal distributions with the indicated mean and standard deviation.
Meal parameters: the four parameters governing the input to the simulated metabolic system, generating different diets from the normal distribution.The values are indicated in mean and standard deviation.MMc stands for Michaelis-Menten constant.10.3389/fnut.2023.1304540Frontiers in Nutrition 05 frontiersin.org We do this b4,8)ncatenating data from all diets, forming the new snapshot matrices X i, * , X i, * ′ and Ω i, * per individual (analogous toEquations 3,4,8)according to Equations 14, 15, and Equation 16 I M T × × ) are collected for multiple interventions into 4-way data ( D I M T × × × ) (24).Thus, pDMDc has the potential to incorporate several provocations of the metabolic system into the same dynamic model, predict metabolite response to a new diet, and also identify groups of differential responders by clustering the individual latent state trajectories.Using pDMDc on the