Non-invasive Estimation of Atrial Fibrillation Driver Position With Convolutional Neural Networks and Body Surface Potentials

Atrial fibrillation (AF) is characterized by complex and irregular propagation patterns, and AF onset locations and drivers responsible for its perpetuation are the main targets for ablation procedures. ECG imaging (ECGI) has been demonstrated as a promising tool to identify AF drivers and guide ablation procedures, being able to reconstruct the electrophysiological activity on the heart surface by using a non-invasive recording of body surface potentials (BSP). However, the inverse problem of ECGI is ill-posed, and it requires accurate mathematical modeling of both atria and torso, mainly from CT or MR images. Several deep learning-based methods have been proposed to detect AF, but most of the AF-based studies do not include the estimation of ablation targets. In this study, we propose to model the location of AF drivers from BSP as a supervised classification problem using convolutional neural networks (CNN). Accuracy in the test set ranged between 0.75 (SNR = 5 dB) and 0.93 (SNR = 20 dB upward) when assuming time independence, but it worsened to 0.52 or lower when dividing AF models into blocks. Therefore, CNN could be a robust method that could help to non-invasively identify target regions for ablation in AF by using body surface potential mapping, avoiding the use of ECGI.


INTRODUCTION
Atrial fibrillation (AF) is the most common type of arrhythmia in clinical practice, affecting more than 33 million patients in the world (Chugh et al., 2014). AF is also a condition that increases the risk of the patients to suffer embolism, cardiac failure, stroke, and in the worst of cases, death (Fuster et al., 2006). Therefore, one of the clinical goals in AF patients is to restore sinus rhythm. This objective can be achieved by pharmacological treatment (Lip and Tse, 2007), but termination of arrhythmic processes is usually accomplished by ablation of the cardiac tissue. Main targets of ablation are AF onset locations and drivers responsible for AF perpetuation (Guillem et al., 2016).
Previous human in vivo research showed different strategies to locate AF drivers and guide pulmonary veins isolation (PVI). In the case of invasive mapping procedures (Narayan et al., 2012;Krummen et al., 2017;Navara et al., 2018), several catheters are introduced inside the atrial chambers to record from 8 to 128 simultaneous electrograms (EGM) (Narayan et al., 2012). Despite the number of intracardiac signals, the large distance between catheter sensors (up to 1-2 cm) and the complex atrial anatomy limits the capability of intracardiac mapping systems to characterize the global electrical activity in AF (Oesterlein et al., 2016). Non-invasive procedures based on ECG imaging (ECGI) have been also tested to guide PVI (Haissaguerre et al., 2013(Haissaguerre et al., , 2014Dubois et al., 2015). However, PVI success rate gets lower in persistent AF, and phase singularity (PS)-guided ablation is suggested to be a reliable alternative for these more complicated cases (De Greef et al., 2018;Rottner et al., 2020).
Although ECGI has not been validated during AF propagation patterns, it has been demonstrated as a promising tool to identify AF drivers and guide ablation procedures (Cuculich et al., 2010;Haissaguerre et al., 2013Haissaguerre et al., , 2014Pedrón-Torrecilla et al., 2016;Rodrigo et al., 2016). ECGI combines both numerical modeling of the bioelectric properties of the thorax and signal processing, with the aim of reconstructing the electrophysiological activity on the heart surface by using a non-invasive recording of body surface potentials (BSP) (Brooks and Macleod, 1997;Gulrajani, 1998). However, the inverse problem of ECGI has several drawbacks. First, it requires an accurate mathematical modeling of both atria and torso, mainly from CT or MR images. Next, the inverse problem of ECGI is ill-posed because the propagation between the epicardium and the torso implies information loss due to signal attenuation (Rodrigo et al., 2014), and BSP are also blurred compared to the signals on the heart due to the laws of electromagnetic field theory. Therefore, regularization methods are needed to obtain reliable and stable epicardial potential reconstructions (Tikhonov and Arsenin, 1977;Oster and Rudy, 1992;MacLeod and Brooks, 1998;Pedron-Torrecilla et al., 2011;Milanic et al., 2014). For these reasons, inverse problem-based approaches still need further improvement.
In the last decade, machine learning (ML) and deep learning (DL) techniques have undergone considerable development in bioengineering, and this include novel research in AF. For example, DL has been used in AF detection by using recurrent neural networks (RNN) and convolutional neural networks (CNN) (Xiong et al., 2017), by STFT, stationary wavelet transform and CNN (Xia et al., 2018), and in the detection of individuals at risk of suffering from Paroxysmal AF by CNN (Pourbabaee et al., 2018). However, most of AF-based studies do not include the estimation of ablation targets. Nonetheless, recent research showed that ML and DL methods can be also used in more complex tasks, like heart surface potentials estimation from BSP using autoencoders (Bujnarowski et al., 2020) and rotor identification from 12-lead ECG using decision trees (Luongo et al., 2021).
Therefore, in this study we propose to model the location of AF drivers from BSP as a supervised classification problem. We used CNN, which accounts for spatial characteristics, to address the location of AF drivers from previously labeled realistic computerized AF models (Figuera et al., 2016;Cámara-Vázquez et al., 2020).
The remaining of the study is organized as follows. In Section 2, we introduce the computational models used for this study, the experimental set-up, performance metrics, and DL architecture.
Final results are summarized in sections 3 and in section 4 main conclusions are presented.

Geometric Models and EGM Computation
Realistic 3D model for the atrial anatomy was composed of 284,578 nodes and 1,353,783 tetrahedrons (673.4 ± 130.3µm between nodes) (Dössel et al., 2012) that considers a simplified single endocardium-epicardium layer for the atrial tissue.
We simulated 13 different AF propagation patterns in both left atria (LA) and right atria (RA), with different complexity and driver positions: posterior left atrial wall (PLAW), left inferior pulmonary vein (LIPV), left superior pulmonary vein (LSPV), right inferior pulmonary vein (RIPV), right superior pulmonary vein (RSPV), right atrial appendage (RAA), and right atria free wall (RAFW). Sampling rate of the signals was fs = 500Hz, while their duration ranged from 2 to 5 s.
The final computerized models were comprised of N = 2, 048 nodes for atria, and M between 2,206 and 3,970 nodes for torsos (10 different geometries were used), under the assumption of a homogeneous, unbounded, and quasi-static conducting medium by summing up all effective dipole contributions over the entire model (García-Molla et al., 2014;Figuera et al., 2016;Pedrón-Torrecilla et al., 2016;Rodrigo et al., 2017). Therefore, the EGMs of the entire model were computed as: where V( − → r ) is the EGM signal at the measuring point, V m is the transmembrane potential distribution across the atria, − → r is the distance vector between the measuring point and a point in the tissue domain, and r is its corresponding scalar distance. The transmembrane potentials were defined in a scattered 3D mesh, so the gradient was computed by interpolating a quadratic function that involves two surrounding points (Lawson, 1984): V m,i −V m,j = c 1 x+c 2 y+c 3 z+c 4 x 2 +c 5 y 2 +c 6 z 2 +c 7 xy+c 8 yz+c 9 xz where V m,i and V m,j are the transmembrane potentials at the points i and j; x, y, z are the incremental Cartesian coordinates from j to i, and coefficients c 1 to c 9 were computed by the least square method in, at least, nine neighboring points of each location.

Atrial cell model
The atrial cell model used is based on the one proposed in Nygren et al. (1998) and Koivumäki et al. (2014), where the electrical activity of a single myocyte is described in terms of their transmembrane potentials and ionic currents: where V is the transmembrane potential, I ion is the transmembrane ionic currents, and C m is the cell membrane capacitance. Then, the electrical propagation across the atrial tissue was simulated by including the transmembrane currents caused by the intercellular Gap junction current (due to the transmembrane potential gradient) in the previous formulation: where V k is the transmembrane potential at node k, V i is the transmembrane potential at the neighbor node i, D k,i is the diffusion coefficient between the node k and i, and d k,i is the distance between those nodes. Since the atrial electrical conduction is anisotropic (its velocity is higher in the longitudinal fiber orientation than at transverse), the diffusion coefficient D k,i that modulates the intercellular ionic current is determined as follows: where α is the angle between the longitudinal fiber orientation and the vector linking nodes k and i, and D long and D trans are the longitudinal and transverse diffusion coefficients, respectively (Rodrigo, 2016). Fibrotic and scar tissues were modeled by setting the diffusion values of the involved nodes to 0. In the case of the fibrotic tissue, a certain percentage of random nodes were disconnected, depending on the pattern to simulate. The final system of differential equations was solved by Runge-Kutta integration (using NVIDIA Tesla C2075 6G).

Atrial Fibrillation Driver Detection as a ML Classification Problem
We proposed to address the location of AF drivers as a supervised classification problem. We divided the atria into seven regions (Figure 1, right) where the AF driver can be found (Haissaguerre et al., 2014). Each region, which ranged from 1 to 7, represents a class to which the AF driver belongs. To obtain labeled data, AF driver location from the computerized model was manually classified into one region for each time-instant. An additional label (0) is assigned when no driver is found.

Body Surface Potentials (BSP)
Input data for DL algorithms are simulated BSP from each of the 10 torso geometries available. To compute the initial BSP y, the forward problem of ECGI was solved by computing the corresponding M × N transfer matrix A using the boundary element method (Barr et al., 1977;De Munck, 1992;Pedrón-Torrecilla et al., 2016): where y t are the BSP from each time instants. Those simulated BSP must be referenced to the Wilson Central Terminal (WCT). This step is essential since real ECG recordings are referenced to this point due to the electrical noise of the ground, and it can be mathematically represented as: where ECG notref is the not-corrected ECG, ECG WCT is the WCT signal, computed as the average of the ECG signals at the WCT points, and ECG is the final corrected ECG. Therefore, if we apply the same methodology to BSP, y t referenced signals can be computed as: where N WCT is the number of WCT points, and M WCT is a matrix of zeros except the rows that correspond to the WCT leads, that have the same values as the corresponding rows of the A matrix (Rodrigo, 2016). Therefore, to directly compute the BSP referenced to the WCT, it is possible to compute a corrected A CTW matrix as: FIGURE 2 | Construction of the different tensor layouts. Above: 3-channel tensor. Below: 1-channel tensor.
Once the WCT-referenced simulated BSP were computed, they were corrupted with additive Gaussian noise and filtered using: fourth-order bandpass Butterworth filter (fc 1 = 3 Hz and fc 2 = 30 Hz) (Figuera et al., 2016;Pedrón-Torrecilla et al., 2016). Signal-to-noise ratios (SNR) ranged from 5 to 50 dB. Finally, a set of 64 electrodes from each torso were chosen to represent a realistic multi-electrode vest used in electrophysiological studies, as shown in Figure 1 (left).

Image Representation of BSP
To obtain final input data, we represented the layout of electrodes in Figure 1 as images. For this purpose, we build two different types of tensors for each time instant (see Figure 2): • 3-channel tensor. This first approach consists on creating 3D matrices of shape (6 × 4 × 3). The first and last channel contain the BSP of the torso and back, respectively (24 electrodes each), whereas the second channel contains BSP from the sides (16 electrodes distributed on the last 4 rows, while the rest are filled with zeroes). • 1-channel tensor. This second approach consists on organizing the BSP in a single-channel 2D matrix of shape (6 × 16 × 1). To do that, we considered a multielectrode vest as a cylinder that was unrolled to a 12-column flat layout. Then, we added two additional mirror columns to each side of the matrix to represent the fact that the first and last columns of the tensor are in touch in a real 3D geometry. Finally, since sides electrodes are four for each column (instead of 6), the empty values are filled with the mean of the three nearest electrodes.

Custom CNN
CNN are a type of DL algorithm used mainly in image recognition and image classification. Comparing with other types of DL algorithms, this approach has a superior ability to extract features, increasing classification performance (Krizhevsky et al., 2017). CNN-based architectures consist on the following layers: • Input layer. In this case, input data are an image, i.e., a 3D matrix. This input matrix is also named as tensor. • Convolutional layer. In this type of layer, a filter of size (s, t) is applied to the input tensor. The main aim of the convolutional layer is to extract feature maps that can reflect certain characteristics of interest (edges, shape detection, etc.) (Li et al., 2019). This filtering process is performed using the convolution operation, which can be mathematically represented as: where a i,j is the output (activation) on coordinates (i, j), x i+m,j+n is the value of the input tensor in coordinates (i + m, j + n), w m,n is the coefficient of the filter in coordinates (m, n), b is the bias, and f (·) is a nonlinear activation function. • Pooling layer. This type of layers is frequently used between convolutional layers, since they reduce the amount of information that convolutional layers generate. There are several ways to implement a pooling layer, and one of the most used is the max-pooling. In this case, a non-overlapped window is applied to the input feature map, and the output is the maximum value of the considered window. If we consider a square window, the output feature map shape will be reduced in a 1/m factor, where m is the dimension of the window. • Dense layer. After applying a certain combination of convolutional and pooling layers, a multilayer perceptron (MLP) can be applied afterwards. Therefore, it is necessary to create an unidimensional vector from the output of the convolutional network (flatten operation). As explained before, the output layer of the MLP will have one unit for each value the network should output.
The proposed CNN-based architecture is shown in Figure 3A.
Input data are tensors of shape (150 × 152 × 3) or (78 × 192 × 1), depending on the tensor architecture evaluated. We used three convolutional layers with 32, 64 and 64 filters with (3 × 3) size. ReLU activation function was used in both convolutional layers. Max-pooling is applied after each convolutional layer [(2,2) window size]. Final dense layers are composed by layers of (128, 64) units (ReLU activation function), while the output layer is a Softmax layer composed by 8 units, one for each label. Finally, we used dropout between dense layers (0.6 dropout rate) to avoid overfitting.

Pre-trained CNN
Finding the right parameters for a CNN (number and dimensions of layers, pooling, etc.) can be a very complicated task, and training this type of network is also very computationally expensive. To address this task, there are several models that were previously trained with a certain dataset that can be also used to solve similar problems. This approach is called transfer learning (Pan and Yang, 2009). In this study, we decided to adapt the DenseNet121 (DN121, Huang et al., 2016), InceptionResNetV2 (IRNV2, Szegedy et al., 2016), and Xception (Chollet, 2016) models to our problem. These models were trained with the ImageNet database (Russakovsky et al., 2015), comprised of 14 million images corresponding to 1,000 different classes. Therefore, since we want to obtain an output with the prediction for 8 cases instead of 1,000, we added several fully connected layers to the output of the pre-trained models. This adaptation is showed in Figure 3B. Dense layers were composed of (128,64) units (ReLU activation function), while the output layer is a Softmax layer composed by 8 units.
Finally, it is essential to remark that these pre-trained models require 3-channel input tensors. Therefore, to use the previously described 1-channel tensor, we decided to repeat the 2D matrix twice to obtain a 3-channel tensor [(78 × 192 × 3) shape].

Performance Metrics
To assess the performance of the implemented DL models, we used four different metrics: • Accuracy (Acc). It is measured as where TP full is the total number of well-classified drivers (true positives), and Total is the total number of drivers. • Cohen's Kappa (κ). It is a robust statistic used for rating reliability testing (McHugh, 2012), and is computed as where p o is the relative agreement among raters (identical to accuracy), and p e the hypothetical probability of chance agreement. A score of 1 represents a perfect agreement between raters, and a score of 0 represents the agreement that can be expected from random chance. Scores less than 0 mean that there is less agreement than chance. • Sensitivity (or true positive rate, TPR). It is the proportion of positive samples for a class that are correctly classified. It is measured as: where TP is the number of well-classified drivers for a certain class (true positives), and FN is the number of drivers that was wrongly classified to another class (false negatives). It is considered a measure of how well a test can identify true positives.
• Specificity (or true negative rate, TNR). It is the proportion of negative samples for a class that are correctly classified. It is measured as: where TN is the number of drivers that did not belong to a certain class that were correctly classified (true negatives), and FP is the number of drivers that was classified as positive for the same class, but they correspond to another class (false positives). It is considered a measure of how well a test can identify true negatives.
Training and test sets were split using two different approaches: • Time independence of input data. In this case, each time instant is considered as an independent sample. The whole data set was then randomly split into training (80%) and test (20%) sets, using hold-out validation during the training process (20%). In this scenario, samples from each AF model are randomly distributed to the training, validation, and test sets, but training, validation, and test processes are carried out with completely different samples, i.e., samples from the training set are not used in the validation and test sets. • Division of AF models in consecutive blocks. To describe a more realistic scenario, we divided each AF computerized model in three independent blocks: training (first 80% of the length of the signals), hold-out validation (last 20% of the training set), and test (final 20% of the length of the signals). Therefore, these three blocks contain consecutive samples, i.e., the training process is carried out using the first block of time instants, and test is achieved with the last block of samples.
On the other hand, we had to address imbalance of data. In our computerized models, there are several atrial areas where the driver only appears in one single AF computerized model, whereas other areas are more susceptible of containing a driver. Therefore, there are several atria regions that are over represented in the data set. To address this problem, we tried to weigh the classes accordingly to the probability of occurrence (class weight). It applies a higher penalization in wrong classifications in under-represented classes.
Regarding the training process, we trained the different models for a maximum number of epochs of 1,000, with reduction in the learning rate and early stopping if there is no improvement in the loss value during a certain number of training epochs. Moreover, we use the model checkpoint technique to get the best model obtained during the training process according to a metric of interest (loss value for the custom CNN, and validation loss for the pre-trained CNN). Finally, in the case of the pre-trained models, we followed two different training approaches: training the dense layers and the last two convolutional layers (while freezing the rest of the CNN, partial training), and re-training the whole pre-trained network (full training).

Performance of CNN
The first carried out experiment consisted on assessing the obtained performance with each tensor architecture. For this purpose, we trained the CNN architecture explained in Figure 3 with 1-channel and 3-channel tensors. In both cases, tensors were obtained from the same BSP dataset. Therefore, we got two different CNN models to be compared. Moreover, to simulate a realistic scenario, we used noisy BSP (SNR = 20 dB). Table 1 shows the results considering each time instant as an independent sample and division in blocks. In the first case, we were able to correctly locate the 95 and 91% of drivers in the training and test sets, respectively (Cohen's kappa of 0.92 and 0.88) when using 3-channel tensors. Results obtained with the CNN model trained with 1-channel tensors were very similar, with an accuracy of 0.94 and 0.91 in training and test sets, respectively (Cohen's kappa of 0.92 and 0.88).
On the other hand, results for division in blocks were worse. In this scenario, accuracy was 0.97 and 0.49 in training and test sets when training with 3-channel tensors, respectively (Cohen's kappa of 0.96 and 0.33). Metrics when training with 1-channel tensors were similar in the training set but slightly better in the test set (accuracy of 0.52 and Cohen's kappa of 0.37), but worse compared with time independence approach. These results suggest that we are overfitting the model to the training set. Figure 4 shows the confusion matrix obtained in the test set for both training approaches and tensor types, while Table 2 shows the accuracy, sensitivity and specificity obtained for each label when training with 1-channel tensors. In the case of time independence, accuracy ranged from 0.81 to 0.975, although it worsened to 0.43 in the case of the drivers located in the septum area (label 7, accuracy of 0.430). These results are justified by the imbalance of our dataset. Labels from 0 to 5, although their population are also imbalanced, are highly represented, but the number of samples with labels 6 and 7 is very low. Results for the CNN model trained with 3-channel tensors were slightly worse. Regarding sensitivity, the system is able to detect with high precision clinically relevant regions. Those areas where ECGI has less reconstruction capacity (like the septum area), the sensitivity is lower. However, it is essential to remark that both in clinical practice and in our database these regions present a lower probability of present drivers. Regarding specificity, obtained scores were always above 0.94.
In the case of division in blocks, results were significantly worse. Highest accuracy was obtained for the label 3 (0.80), which is the one with the largest representation in the dataset. However, accuracy for the rest of labels ranged from 0.22 to 0.56, and near 0 for labels 6 and 7. The high temporal redundancy between consecutive signal samples could explain overfitting in this scenario, although the training set was previously shuffled before fitting the model. The low number of computerized  AF models makes also difficult to help the CNN model to generalize. Regarding sensitivity, results were also significantly worse, although the system was able to get a sensitivity of 0.799 in the most represented atrial zone (label 3). Finally, the rate of TN is very high for every label, except for the label 3, with a specificity of 0.635.

Performance of Pre-trained CNN and Noise Robustness
In this second experiment, we evaluated the possibility of using pre-trained CNN models for AF driver localization, instead of using a custom CNN architecture. Models were trained with 1channel tensors obtained with noisy BSP (SNR = 20 dB). Since pre-trained models require 3-channel tensors, we repeated the tensor twice to obtain 3-channel tensors. The custom CNN model was trained with the original 1-channel tensor.
Moreover, to assess the noise robustness of those DL models, we tested them with tensors whose associated BSP were corrupted with noise (SNR from 5 to 50 dB, in steps of 5 dB). This procedure was repeated 20 times in order to obtain test signals with different noise but same SNR. Then, the mean and SD of performance metrics were computed. Figure 5 shows the performance metrics obtained for the test set, assuming time independence and division in blocks, for each CNN model, whereas Table 3 shows the average accuracy On the other hand, average accuracy values obtained for the division in blocks approach were significantly worse for all the CNN models. The best average accuracy value was 0.51 among the different pre-trained models (SNR = 20 dB upward, Xception fully trained model), whereas the best overall accuracy value was 0.527 for the custom CNN model. Both are much lower values than the ones obtained for time independence.
Regarding the performance when testing with much noisier signals, performance degraded for SNR = 5 dB, with average accuracy scores below 0.802 when assuming time independence (0.479 with division in blocks). However, it improved from SNR = 10 dB with average accuracy values over 0.87 in the time independence approach (in the division in blocks approach the best score for SNR = 10 dB was 0.51 for the custom CNN model). Additionally, SD values of the accuracy were always below 0.001, which suggests that the CNN models are robust to noise changes.
Finally, performance metrics for fully trained pre-trained models were very similar between them when assuming time independence, outperforming the custom CNN model. However, in the case of division in blocks, the custom CNN outperformed fully trained pre-trained networks, although differences between these models were very small.

DISCUSSION AND CONCLUSIONS
The use of CNNs can help to identify target regions for ablation using body surface potential mapping, avoiding ECGI. The proposed method, which converts BSP into images, has been demonstrated to be accurate and robust to noise, i.e., the performance just degrades for very low values of SNR.
Regarding the architecture of the tensors, the 1-channel tensor-based architecture was able to obtain more accurate results than the 3-channel one, both assuming time independence or division in blocks. In relation to the training process, the single-channel tensor-based CNN model required a higher number of epochs to be fitted than in the 3-channel one. However, since each epoch is slower in the 3-channel architecture (3D matrix), the final training process becomes faster when using single-channel tensors.
Moreover, this methodology makes transfer learning very easy to apply, since it can be used to adapt much more complex pre-trained CNN models to a very specific task with promising results. However, results were significantly better when retraining the whole network (much slower procedure) than when training just the final layers. Therefore, using pre-trained models requires further research.
Although this study showed very promising results, it has several limitations that should be taken into account. The first one is the size and balance of our dataset. It is composed by 130 sets of BSP, obtained with 10 torso geometries but from only 13 AF computerized models, so the number of represented propagation patterns is low. Moreover, the distribution of drivers across the seven defined atrial regions is not balanced, i.e., there are regions that are over-represented, and performance will be worse on underrepresented regions. However, we have decided to use the proposed division in seven regions because it represents a clinical-based classification of the areas where the AF drivers are more commonly found and has been already clinically used to guide AF ablation strategies demonstrating to have a clear clinical significance (Haissaguerre et al., 2014). Besides, the proposed classification method can be easily extrapolated to other atrial geometry divisions based on a higher number of smaller regions, being able to get a higher resolution in the driver classification.
Another important problem to be faced is the training approach. In this study, we first trained the models with random samples that belongs to the 13 AF models available (time independence). However, in a real situation, the DL model will have to compute a prediction with a data set that the network has never seen. Therefore, we decided to train the models with a chunk of signals of each model, while testing with the rest. In this scenario, performance substantially worsened because of the high temporal redundancy between consecutive time instants and the low number of AF models. Both facts lead to overfit to the training set, even when using tools like dropout. Using RNNs could be useful to face the redundancy problem, since they are able to extract temporal characteristics of BSP that are omitted when using CNN.
In a real scenario, assuming model independence (i.e., training with some AF models and testing with the rest) should be the way to go, but it will require the availability of a higher number of computerized models and propagation patterns. Finally, another point of future work will be applying this methodology with real patient data. However, nowadays there is a lack of gold standard for the identification of AF drivers, and there are not labeled clinical data that could help us to validate this methodology.

DATA AVAILABILITY STATEMENT
Some of the computerized models used in our study are publicly available at the Experimental Data and Geometric Analysis Repository (EDGAR, https://edgar.sci.utah.edu/) by the Consortium for ECG Imaging (CEI, https://www. ecg-imaging.org/). Further inquiries can be directed to the corresponding authors.

AUTHOR CONTRIBUTIONS
MÁC-V and OB-P: experimental setup, code implementation, and computational tests. MÁC-V, IH-R, MG, AC, and OB-P: data collecting, cleaning, and pre-processing. MÁC-V, IH-R, EM-R, MG, AC, and OB-P: manuscript preparation. All authors contributed to the article and approved the submitted version.

FUNDING
This work has been partially supported by: Ministerio de Ciencia e Innovación (PID2019-105032GB-I00), Instituto de Salud Carlos III, and Ministerio de Ciencia, Innovación y Universidades (supported by FEDER Fondo Europeo de Desarrollo Regional PI17/01106 and RYC2018-024346B-750), Consejería de Ciencia, Universidades e Innovación of the Comunidad de Madrid through the program RIS3 (S-2020/L2-622), EIT Health (Activity code 19600, EIT Health is supported by EIT, a body of the European Union) and the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 860974.