ORIGINAL RESEARCH article

Front. Syst. Biol., 17 December 2025

Sec. Integrative Systems Neuroscience

Volume 5 - 2025 | https://doi.org/10.3389/fsysb.2025.1715692

Neural networks and foundation models: two strategies for EEG-to-fMRI prediction

  • Ouroboros Neurotechnologies, Lausanne, Switzerland

Article metrics

View details

2,2k

Views

98

Downloads

Abstract

Electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI) are two widely used neuroimaging techniques, with complementary strengths and weaknesses. Predicting fMRI activity from EEG activity could give us the best of both worlds, and open new horizons for neuroscience research and neurotechnology applications. Here, we formulate this prediction objective both as a classification task (predicting whether the fMRI signal increases or decreases) and a regression task (predicting the value of this signal). We follow two distinct strategies: training classical machine learning and deep learning models (including MLP, CNN, RNN, and transformer) on an EEG-fMRI dataset, or leveraging the capabilities of pre-trained large language models (LLMs) and large multimodal models. We show that predicting fMRI activity from EEG activity is possible for the brain regions defined by the Harvard-Oxford cortical atlas, in the context of subjects performing a neurofeedback task. Interestingly, both strategies yield promising results, possibly highlighting two complementary paths for our prediction objective. Furthermore, a Chain-of-Thought approach demonstrates that LLMs can infer the cognitive functions associated with EEG data, and subsequently predict the fMRI data from these cognitive functions. The natural combination of the two strategies, i.e., fine-tuning an LLM on an EEG-fMRI dataset, is not straightforward and would certainly require further study. These findings could provide important insights for enhancing neural interfaces and advancing toward a multimodal foundation model for neuroscience, integrating EEG, fMRI, and possibly other neuroimaging modalities.

1 Introduction

Electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI) are two widely used neuroimaging techniques for human brain activity investigation. While EEG provides a direct measure of the electrical activity of the brain by using electrodes placed on the scalp, fMRI measures the activity of the brain indirectly, by detecting changes in the cerebral blood flow and oxygen demand. As a consequence, EEG is typically used to record the activity of cortical areas, whereas fMRI can detect the activity of subcortical regions as well. EEG provides a high temporal resolution, and is a relatively simple and inexpensive neuroimaging technique, compatible with wearable devices. By contrast, fMRI offers a high spatial resolution, at the expense of a greater complexity and cost, and requires the subject to remain immobile inside the scanner. These complementary strengths and weaknesses motivated the emergence of EEG-fMRI research, in which both techniques are used simultaneously (Abreu et al., 2018; Warbrick, 2022).

1.1 Predicting fMRI from EEG

The accessibility of EEG-fMRI datasets allowed researchers to explore whether fMRI activity could be predicted from EEG activity. Early research demonstrated that blood oxygenation level dependent (BOLD) (Ogawa et al., 1990) fluctuations in the occipital cortex can be predicted from EEG data, using a linear combination of B-spline basis functions (Sato et al., 2010). Beyond the occipital cortex, two early studies reported the successful prediction of amygdala activity, using classical machine learning models such as Ridge regression (Meir-Hasson et al., 2014; Keynan et al., 2016). Another study focused on connectivity, by applying a model based on sparse canonical correlation analysis to the prediction of the fMRI connectome from the EEG connectome at the scale of the entire brain (Deligianni et al., 2014).

More recently, deep learning models were applied to the prediction of fMRI activity from EEG activity, including convolutional neural networks (Li et al., 2019), attentional graphs (Calhas et al., 2023a), autoencoders and generative adversarial networks (Calhas et al., 2023b). Two preprints proposed novel neural network architectures with the objective to improve interpretability (Kovalev et al., 2022; Semenkov et al., 2024). Research also focused on addressing the hemodynamic delay of the BOLD signal (Simoes et al., 2020), and on predicting the activity of specific brain regions such as the inferior frontal gyrus (Or-Borichev et al., 2023) and the ventral striatum (Singer et al., 2023), respectively associated with cognitive control and reward processes.

Very recently, a series of studies and preprints proposed additional approaches, such as using sinusoidal representation networks (Li et al., 2024a), transformers (Li et al., 2024b; Lanzino et al., 2024), and simpler neural networks inspired by the U-Net architecture (Grover Roos et al., 2025). Overall, current research consistently highlights that predicting fMRI activity from EEG activity could significantly enhance neuroimaging capabilities by giving us the best of both worlds, and open new horizons for neuroscience research and neurotechnology applications (Calhas et al., 2023a; Calhas et al., 2023b; Kovalev et al., 2022; Semenkov et al., 2024; Simoes et al., 2020; Or-Borichev et al., 2023; Singer et al., 2023; Li et al., 2024a; Li et al., 2024b; Lanzino et al., 2024; Grover Roos et al., 2025).

1.2 Neurofeedback

One of the neurotechnologies that might benefit the most from EEG-to-fMRI prediction is neurofeedback (NF). NF consists in providing real-time information to a subject about their own brain activity, and encouraging them to adapt their behavior according to this measure. The objective of a NF protocol is for the subject to learn self-regulation and reach a certain cognitive state, whether by increasing or decreasing the power of specific EEG frequency bands (EEG NF), the fMRI activity in specific brain regions (fMRI NF), or a combination of both (EEG-fMRI NF) (Ciccarelli et al., 2023). A study demonstrated that fMRI NF scores and EEG-fMRI NF scores can be predicted from EEG data using a sparse regression model, and that this prediction adds significant information compared to EEG NF scores alone (Cury et al., 2020).

Since the prediction of fMRI activity from EEG activity is an emerging area of research, there is, to our knowledge, no systematic review identifying the most promising brain regions and cognitive processes that could be targeted for EEG-to-fMRI prediction. Our objective is not to provide such a systematic review. However, based on their importance for understanding human cognition, we speculate that the brain regions associated with valuation (Plassmann et al., 2007; Kolling et al., 2012), motivation (Pessiglione et al., 2007; Lebreton et al., 2009), exploration (Boorman et al., 2009), decision-making (Koechlin et al., 2003; Badre et al., 2009; Alexander and Brown, 2011), learning (O’Doherty et al., 2004; Daw et al., 2011), and reasoning (Donoso et al., 2014), in particular, might be valuable candidates. Arguably, all these cognitive functions are engaged during a NF protocol, potentially highlighting the importance of EEG-fMRI NF datasets for future research. This perspective motivated us to select such a dataset for our experiments, although our conclusions do not depend on the fact that the EEG-fMRI data was acquired during a NF protocol.

1.3 Toward a multimodal foundation model

We argue that a model capable of predicting fMRI activity from EEG activity at the scale of the entire brain would meet the definition of a foundation model, since it would likely be trained on broad data and could be adapted to a wide range of tasks (Bommasani et al., 2021). Recently, foundation models focusing on a single neuroimaging modality have been developed using self-supervised learning, whether for EEG (Kostas et al., 2021; Jiang et al., 2024a; Cui et al., 2024; Ogg et al., 2024) or fMRI (Shi et al., 2023; Ortega Caro et al., 2023; Ma et al., 2025). However, the model we envision would serve a different purpose. It would be a multimodal foundation model, capable of performing ā€œneural translationā€ between several neuroimaging modalities. Advancing toward this multimodal foundation model could come with significant challenges, considering both the scarcity and heterogeneity of EEG-fMRI datasets.

Here, we suggest that these challenges might eventually be overcome using a combination of two strategies. The first, classical strategy is the supervised learning approach, which consists in training classical machine learning and deep learning models on an EEG-fMRI dataset. The second, novel strategy consists in directly leveraging the capabilities of pre-trained large language models (LLMs) and large multimodal models (LMMs) for this prediction objective. Indeed, LLMs such as Gemma-2-2B-IT (Riviere et al., 2024) and Llama-3.2-3B-Instruct (Grattafiori et al., 2024), and LMMs such as PaliGemma2-3B-Mix-224 (Steiner et al., 2024) are all pre-trained on extensive corpora of documents, which we assume should include a significant number of neuroscience articles and books, or other sources of knowledge on EEG and fMRI patterns. This extensive pre-training might help us to overcome the scarcity and heterogeneity of EEG-fMRI datasets, by effectively outsourcing a part of the problem. Foundation models are rapidly emerging as powerful instruments for accelerating scientific discovery (Griffin et al., 2024; Gottweis and Natarajan, 2025), including in neuroscience (Wang and Chen, 2024), and a very recent study explored the possibility of encoding EEG signals as LLM-compatible tokens (Jiang et al., 2024b). However, to our knowledge, LLMs and LMMs have not yet been directly leveraged for the complex multimodal task of EEG-to-fMRI prediction.

1.4 Two strategies for EEG-to-fMRI prediction

Here, we evaluate these two strategies, i.e., training our own models or relying on pre-trained foundation models, on the same EEG-fMRI dataset. For both strategies, we formulate the prediction objective as a classification task: predicting whether the fMRI signal increases or decreases based on the EEG signal. For the first strategy, i.e., training our own models, we also add a regression task: predicting the value of the fMRI signal based on the EEG signal. Critically, leveraging LLMs or LMMs implies that both features and targets should be designed in order to be described by relevant keywords: EEG frequency bands ā€œwith namesā€ instead of arbitrary spectral representations, fMRI brain regions ā€œwith namesā€ instead of arbitrary voxel positions, and semantically meaningful classes (ā€œincreaseā€, ā€œdecreaseā€). Therefore, we do not leverage LLMs or LMMs for the regression task, whose targets are numerical.

Since neuroscience articles and books often associate EEG and fMRI patterns with cognitive functions, the keywords describing the latter might serve as intermediate representations between these two neuroimaging modalities. Following this idea, we hypothesize that LLMs could have the capability to infer the cognitive functions associated with EEG data, and subsequently predict the fMRI data from these cognitive functions. We test this hypothesis by implementing a Chain-of-Thought (CoT) (Jason et al., 2022) approach separating these two steps. Finally, we attempt the natural combination of the two strategies by fine-tuning an LLM on an EEG-fMRI dataset, in order to complement its pre-trained capabilities with additional, task-specific knowledge.

While potentially intriguing, the idea that a multimodal foundation model for EEG-to-fMRI prediction might be partially built upon existing language or vision-language foundation models is consistent with several emerging trends in AI research. In particular, it resonates with the growing interest in integrating foundation models with more specific models to address complex multimodal tasks (Brohan et al., 2023; Bansal et al., 2024). Furthermore, it aligns with the idea of using language as a universal interface between different data modalities, a strategy which proved to be successful, in a different field, to predict the evolution of proteins (Hayes et al., 2025). Overall, we believe that evaluating the capabilities of pre-trained foundation models, and comparing them with the performance of classical machine learning and deep learning models, is an important step that could open a new path for EEG-to-fMRI prediction.

2 Data

We use the publicly available EEG-fMRI NF dataset A multi-modal human neuroimaging dataset for data integration: simultaneous EEG and fMRI acquisition during a motor imagery neurofeedback task: XP1 (Lioi et al., 2020), which can be downloaded from OpenNeuro (Poldrack et al., 2013), an open repository for neuroimaging data. This dataset is the first published open-access bimodal NF dataset integrating EEG and fMRI, and was used in the EEG-fMRI NF study mentioned earlier (Cury et al., 2020). The neuroimaging files are stored in Brain Imaging Data Structure (BIDS) format (Gorgolewski et al., 2016), and the dataset is released under the CC0 license. The dataset authors conducted a NF experiment in which 10 subjects (8 male, 2 female, median age = 27) were instructed to use EEG NF scores and fMRI NF scores to perform as well as possible in a motor imagery task (i.e., they needed to execute mentally a movement without any muscle activation). The experiment included six conditions, with alternating rest and task blocks within the conditions.

In our research, we focus on three conditions, which were completed in random order by the different subjects: 1) The eegfmriNF condition, corresponding to bimodal EEG-fMRI NF. 2) The eegNF condition, corresponding to unimodal EEG NF. 3) The fmriNF condition, corresponding to unimodal fMRI NF. For each condition, the dataset includes the raw fMRI data, the raw EEG data, and the EEG data preprocessed by the dataset authors. The fMRI data was acquired with a 3T MRI scanner using echo-planar imaging, with a repetition time of 2Ā s and a voxel size of mm3. The EEG data was acquired with a 64-channel montage based on the extended 10–20 system, at a sampling rate of 5,000Ā Hz. The dataset authors subsequently resampled the EEG data to 200Ā Hz, and applied a low-pass filter at 50Ā Hz during their preprocessing. The acquisition of EEG data in an fMRI environment is technically complex and can result in high noise (Lioi et al., 2020), making it particularly useful that the dataset authors included the preprocessed EEG data. However, since the preprocessed fMRI data is not included, it is necessary to perform some standard fMRI preprocessing steps before running the experiments.

3 Methods

3.1 Preprocessing

3.1.1 fMRI preprocessing

We preprocess the raw fMRI data using fMRIPrep (Esteban et al., 2019), a robust preprocessing pipeline which automatically performs a series of standard fMRI preprocessing steps, such as motion correction, coregistration, and spatial normalization. In order to ensure subsequent compatibility with the Harvard-Oxford cortical atlas (Kennedy et al., 2003), we normalize the fMRI data using the MNI152Lin output space. We further preprocess the fMRI data using the NiBabel (Brett et al., 2020) and Nilearn (Alexandre et al., 2014) libraries. For each fMRI scan, we extract the average voxel values of our brain regions of interest, which are the regions defined in the Harvard-Oxford cortical atlas. Within this atlas, we use the maximum probability map with no threshold, assigning each voxel to the region with the highest probability, therefore ensuring full and non-overlapping cortical coverage. We remove a systematic drift of the BOLD signal, which tends to increase during an fMRI session. We also normalize the BOLD signal for each region by subtracting the mean and dividing by the standard deviation, and replace the outliers (STD > 3) with the value of the previous scan. The proportion of outliers is below 1% for every brain region. The fMRIPrep preprocessing is documented in Supplementary README.md, and the details about the additional fMRI preprocessing can be found in Supplementary Notebook 1.

3.1.2 EEG preprocessing

We preprocess the EEG data using the MNE-Python (Gramfort et al., 2013) and YASA (Vallat and Walker, 2021) libraries. We divide the EEG data into 2-second segments aligned with the fMRI scans. For each EEG segment, we compute the band powers for a series of frequency bands of interest: delta (1–4Ā Hz), theta (4–8Ā Hz), alpha (8–12Ā Hz), sigma (12–16Ā Hz), beta (16–30Ā Hz), and gamma (30–40Ā Hz). We normalize the band powers for each channel by subtracting the mean and dividing by the standard deviation, and replace the outliers (STD > 4) with the value of the previous segment. The proportion of outliers is below 0.1% for every frequency band. The details can be found in Supplementary Notebook 2.

3.1.3 Outliers

For both fMRI and EEG data, the replacement of outliers with the value of the previous time point is a simple and conservative approach. It allows for the correction of the limited number of outliers in a way that is consistent across both neuroimaging modalities, while minimizing the risk of introducing artificial temporal patterns. For noisier EEG-fMRI datasets, for example, publicly available datasets that may not have been preprocessed to the same extent by the dataset authors, alternative interpolation techniques could be considered, such as spline or autoregressive interpolation for EEG data, and Kalman filtering or low-rank matrix completion for fMRI data.

3.1.4 Features and targets

For the classification task, the target is a binary label indicating whether the normalized BOLD signal is increasing or decreasing between two successive scans. For the regression task, the target is the normalized BOLD signal at a given scan. For both tasks, unless stated otherwise, the features consist of the normalized band powers computed at a given scan, along with those computed during the 5 preceding scans. This 5-scan sequence corresponds to 10Ā s, a duration that encompasses the peak of the hemodynamic response function (Gary, 1999), therefore improving our chances of capturing, in the EEG data, a trace of the events that influenced the fMRI data. Since the eegNF condition of one subject is missing, we remove this subject from all the experiments.

3.2 Machine learning models

We train a series of classical machine learning models using the Scikit-Learn (Pedregosa et al., 2011) library. For the classification task, we select the following models: logistic regression, k-nearest neighbors (KNN), decision tree (DT), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost) (Chen and Guestrin, 2016). For the regression task, we select the following models: linear regression, KNN, DT, RF, SVM, and XGBoost. We evaluate the accuracy of the classification task, and the mean absolute error (MAE) of the regression task. For both tasks, to ensure a robust evaluation across sessions, we implement a threefold rotation of the available conditions (eegfmriNF, eegNF, fmriNF) within each subject. In each iteration, one condition serves as the train set and another one as the test set, while the remaining condition is left out. This cross-validation design ensures that every condition is used exactly once for training and once for testing. We also fine-tune the number of neighbors for the KNN model, and the depth of the tree for the DT model. The details can be found in Supplementary Notebooks 3-4, 15-16.

3.3 Deep learning models

We train a series of deep learning models using the TensorFlow (Abadi et al., 2016) library. We select the following architectures: multi-layer perceptron (MLP), convolutional neural network (CNN) (LeCun et al., 1998), recurrent neural network (RNN) with gated recurrent units (Cho et al., 2014), and transformer (Vaswani et al., 2017). For the classification task, we use the binary cross-entropy loss function and evaluate the accuracy, whereas for the regression task, we use the mean squared error loss function and evaluate the MAE. For both tasks, to ensure a robust evaluation across sessions, we implement a threefold rotation of the available conditions (eegfmriNF, eegNF, fmriNF) within each subject. In each iteration, one condition serves as the train set, another one as the validation set, and the remaining condition as the test set. This cross-validation design ensures that every condition is used exactly once for training, once for validating, and once for testing, while allowing a symmetrical treatment of the classical machine learning and deep learning models. We train our models on the normalized band powers, except for the CNN model, used as a control, for which we rely directly on the EEG signal without band power extraction. For the CNN model, we use 1D convolutional layers, with the EEG channels defined as the input channels of the network. For all models, we use the Adam (Diederik and Jimmy, 2014) optimizer and the ReLU activation function for hidden layers. The details, including the exact architecture and number of trainable parameters for each model, can be found in Supplementary Notebooks 5–8, 17–20.

3.4 Foundation models

3.4.1 Large language models

We leverage two LLMs, Gemma-2-2B-IT and Llama-3.2-3B-Instruct, using the Hugging Face Transformers (Wolf et al., 2020) library. We focus on the classification task and the eegfmriNF condition, and since LLMs require significantly more computational resources than our classical machine learning and deep learning models, we select only a subset of our fMRI scans, band powers, and brain regions of interest. In order to further control the complexity of the prompts, we associate each fMRI brain region with a single EEG channel, which serves as its sole predictor. This association follows an ad hoc region-channel mapping, mainly based on electrode proximity to the brain region, and established, perhaps fittingly, with the help of an LLM. We query both models with prompts including a general context, the selected EEG channel, band powers, and brain region, the normalized band power values, and a description of the prediction task to perform. In this experiment and the following ones, we select the model parameters in order to ensure a relative variety of responses, and parse the model outputs by detecting the presence of inflected forms of the keywords ā€œincreaseā€ or ā€œdecreaseā€ (e.g., ā€œincreasesā€, ā€œincreasedā€, ā€œincreasingā€) in the generated text. The cases where these keywords are missing, or where both keywords are present, are labeled as invalid predictions. We also ensure that the selected subset of brain regions spans at least a reasonable fraction of the cortex and a variety of cognitive functions.

The prompts follow this pattern: ā€œA human subject participated in a neuroscience experiment where EEG data and fMRI data were recorded simultaneously. The EEG band powers were measured every two seconds at electrode [selected EEG channel], with the following results: [EEG results]. The fMRI BOLD signal was measured in the [selected brain region] during the last two seconds. Given the EEG data, is the fMRI signal in this brain region likely increasing or decreasing during these last two seconds? Base your answer on your general knowledge in neuroscience, EEG research, and fMRI research. Since this is a time series, you might need to take into account the hemodynamic response function (HRF), and the fact that after an event, the fMRI response is delayed compared to the EEG response. Please answer with only one word: Increasing or Decreasing. Just give your best prediction, without any explanation.ā€ The selected EEG channel (e.g., ā€œFpzā€) and the selected brain region (e.g., ā€œFrontal Poleā€) are given in plain text. The EEG results are given as a dictionary-like structure, with each frequency band (e.g., ā€œBeta (16–30Ā Hz)ā€) associated with the comma-separated values of the different time steps, presented between square brackets. The details, including the region-channel mapping, can be found in Supplementary Notebook 9.

3.4.2 Chain-of-thought and fine-tuning

We conduct two additional experiments with Gemma-2-2B-IT and a single EEG channel. In the first experiment, we implement an intermediate reasoning step using a CoT approach. Specifically, we query the model with a first prompt to infer cognitive functions based on EEG data, then use a second prompt to infer fMRI data based on these cognitive functions. Both prompts also include a general context and a description of the prediction task to perform.

The EEG-to-cognition prompts follow this pattern: ā€œA human subject participated in a neuroscience experiment where EEG data and fMRI data were recorded simultaneously. The EEG band powers were measured every two seconds at electrode [selected EEG channel], with the following results: [EEG results]. Given the EEG data, which cognitive functions is the subject likely engaging in? Base your answer on your general knowledge in neuroscience and EEG research. Please answer with only a list of cognitive functions. Just give your best prediction, without any explanation.ā€ The subsequent cognition-to-fMRI prompts follow this pattern: ā€œA human subject participated in a neuroscience experiment where EEG data and fMRI data were recorded simultaneously. This subject experienced the following cognitive functions: [predicted cognitive functions]. The fMRI BOLD signal was measured in the [selected brain region]. Given the cognitive functions, is the fMRI signal in this brain region likely increasing or decreasing? Base your answer on your general knowledge in neuroscience and fMRI research. Please answer with only one word: Increasing or Decreasing. Just give your best prediction, without any explanation.ā€ The predicted cognitive functions (e.g., ā€œAttention, Working Memoryā€) are given in plain text. The details can be found in Supplementary Notebook 10.

In the second experiment, we fine-tune the model with parameter-efficient fine-tuning (Dettmers et al., 2023) on input-output pairs obtained from the fmriNF condition, before evaluating again its performance on the eegfmriNF condition. Fine-tuning is implemented by applying low-rank adaptation (Hu et al., 2022) to the query and value projection layers, and the model is optimized using the AdamW (Loshchilov and Hutter, 2017) optimizer. To improve stability and prevent erratic behavior (e.g., the model generating a long chain of ā€œincrease decrease increase decreaseā€¦ā€), the model is fine-tuned independently for each subject-region pair. We use the same prompt pattern as for the initial, single-step experiment with Gemma-2-2B-IT. The details can be found in Supplementary Notebook 11.

3.4.3 Large multimodal model

We leverage one LMM, PaliGemma2-3B-Mix-224, using the Hugging Face Transformers library. We focus again on the classification task, the eegfmriNF condition, and a subset of our fMRI scans and brain regions of interest. However, instead of using multiple frequency bands and a single EEG channel, we take the opposite approach. Specifically, we create a topographic map using the MNE-Python library, displaying the beta band power (16–30Ā Hz) across all EEG channels. We prompt the model with this image, along with a general context, the selected brain region, and a description of the prediction task to perform.

The prompts follow this pattern: ā€œA human subject participated in a neuroscience experiment where EEG data and fMRI data were recorded simultaneously. This EEG topographic map shows the brain activity pattern observed for the band power [selected frequency band]. The fMRI BOLD signal was measured in the [selected brain region] four seconds after that. Given the EEG data, is the fMRI signal in this brain region likely increasing or decreasing? Base your answer on your general knowledge in neuroscience, EEG research, and fMRI research. Please answer with only one word: Increasing or Decreasing. Just give your best prediction, without any explanation.ā€ The selected frequency band (e.g., ā€œBeta (16–30Ā Hz)ā€) and the selected brain region (e.g., ā€œFrontal Poleā€) are given in plain text, and the topographic map is added to the prompt. The details can be found in Supplementary Notebook 12.

3.4.4 Large language model with five EEG channels

We conduct one last experiment with Gemma-2-2B-IT, this time using an extended version of the region-channel mapping, where each fMRI brain region is now associated with five EEG channels instead of one. We use essentially the same prompt pattern as for the initial, single-channel experiment with Gemma-2-2B-IT, but adapt the structure of the EEG results to accommodate the new multichannel information. Specifically, the EEG results are presented as a hierarchical structure, where the frequency bands and their corresponding values are given separately for each electrode. The details, including the extended region-channel mapping, can be found in Supplementary Notebook 23.

3.5 Statistical tests

For each model, we compare the mean accuracy or MAE to a baseline, across all subject-region pairs (for the classical machine learning and deep learning models: 9 subjects 49 regions 441 pairs per cross-validation iteration, totaling 1323 pairs overall). Since we do not assume normality, we perform a one-sided Wilcoxon signed-rank test using the SciPy (Virtanen et al., 2020) library, proceeding similarly for the classification and regression tasks (but searching for opposite effects: higher accuracy for classification, lower MAE for regression). For the foundation models, we also perform McNemar tests, pooling together the predictions from all subjects and brain regions, to compensate for the smaller sample size. Indeed, not only are the foundation models evaluated on a selection of our fMRI scans of interest, but missing or ambiguous predictions must be dynamically excluded, resulting in significantly fewer data points. We also report additional metrics, such as the rank-biserial correlation (RBC), the common language effect size (CLES), as well as the Pearson correlation coefficient for regression, and we perform Wilcoxon signed-rank tests between each pair of models for direct statistical comparison. Wilcoxon signed-rank tests are also performed individually for each region of interest (ROI), corresponding to the cortical regions of the Harvard-Oxford atlas.

For the classical machine learning and deep learning models, the baselines for our statistical tests are relatively straightforward. For the classification task, we use as the baseline the constant prediction of an increase, which corresponds to the majority class, although the two classes are almost perfectly balanced (∼50% each). For the regression task, we use as the baseline the constant prediction of the mean value of the signal, which is zero due to the normalization. For the foundation models, given the lower number of fMRI scans and their more variable target distribution, we proceed differently. We use as the baseline the predictions of a model that randomly samples labels from the true target distribution of the selected fMRI scans, excluding the missing or ambiguous cases from this distribution. We repeat the sampling for 1,000 iterations (Wilcoxon tests) or 10,000 iterations (McNemar tests). The details can be found in Supplementary Notebooks 13, 21, 24, 26.

4 Results

4.1 Machine learning models

All the classical machine learning models for classification, i.e., the logistic regression, KNN, DT, RF, SVM, and XGBoost models, reach an accuracy higher than the baseline (p < 0.01 for KNN, p < 0.001 for the other models), as shown in Figure 1. The best model (SVM) performs slightly better (p = 0.033) than the second best (RF). Among the classical machine learning models for regression, only the RF (p < 0.05, Pearson r = 0.145) and SVM (p < 0.001, Pearson r = 0.175) models perform better than the baseline, as shown in Figure 2. The detailed results can be found in Supplementary Notebooks 14, 22, and the statistics in Supplementary Tables A–D.

FIGURE 1

Bar chart comparing the mean accuracy of various machine learning models. The baseline is green with an accuracy around 0.5. Logistic regression, KNN, DT, RF, SVM, XGBoost, MLP, CNN, RNN, and Transformer models are in shades of blue, with accuracy mostly above 0.50. Asterisks indicate significance levels.

Machine learning and deep learning models for classification. The error bars represent the standard error of the mean across subject-region pairs. Significance is indicated by asterisks: * for , ** for , and *** for . The dashed line indicates the baseline level.

FIGURE 2

Bar chart comparing mean MAE for various models. Baseline is green; others (Linear, KNN, DT, RF, SVM, XGBoost, MLP, CNN, RNN, Transformer) are blue. Linear shows the highest MAE. Statistical significance is marked for RF and SVM.

Machine learning and deep learning models for regression. The error bars represent the standard error of the mean across subject-region pairs. Significance is indicated by asterisks: * for , ** for , and *** for . The dashed line indicates the baseline level.

4.2 Deep learning models

Among the deep learning models for classification, the MLP, RNN, and transformer models reach an accuracy higher than the baseline (p < 0.001), as shown in Figure 1. However, the training graphs of all models except MLP show signs of overfitting, with the validation loss often stagnating or even increasing over the epochs. The best classical machine learning model (SVM) reaches a slightly better performance than the best deep learning model (MLP), but the difference is not statistically significant (p = 0.144). In return, the MLP model outperforms the second best classical machine learning model (RF), but again, the difference remains below the significance threshold (p = 0.181). Among the deep learning models for regression, no model performs better than the baseline, as shown in Figure 2. We also evaluate the number of floating-point operations (FLOPs) necessary for running the MLP (∼23M), CNN (∼51M), RNN (∼39M), and transformer (∼76M) models, to confirm that the higher performance of the MLP model for the classification task is not due to model complexity. The training graphs are displayed in Supplementary Notebooks 5–8, 17–20, while the detailed results can be found in Supplementary Notebooks 14, 22, and the statistics in Supplementary Tables A–D.

4.3 Foundation models

Neither the LLMs nor the LMM perform better than the baseline when the predictions are evaluated using the one-sided Wilcoxon signed-rank test. However, when the predictions are evaluated using the less conservative McNemar test across multiple iterations, a more nuanced view emerges. For the different instances of the Gemma model prompted with a single EEG channel, we observe a relatively low median p-value (median p = 0.12, 0.16, 0.15, 0.14) and a high proportion of statistically significant iterations (proportion p < 0.05 = 29%, 23%, 23%, 24%). The Llama model and the Gemma model prompted with five EEG channels show the opposite trend, with a relatively high median p-value (median p = 0.66, 0.64) and a low proportion of statistically significant iterations (proportion p < 0.05 = 0.5%, 0.7%). The PaliGemma model shows an intermediate pattern (median p = 0.31, proportion p < 0.05 = 8.6%). The proportion of missing or ambiguous predictions varies across models, but is always less than 10%. The detailed results can be found in Supplementary Notebooks 13–14, 24–25, and the statistics in Supplementary Tables E, F.

The visualization of the p-value distributions using histograms provides important insights as well. As shown in Figure 3, for the different instances of the Gemma model prompted with a single EEG channel, we observe a right-skewed p-value distribution, with a mode near p = 0 and a long tail toward p = 1, suggesting that the Gemma model achieves small but reliable gains over the baseline, whether it is used with direct prompting or with a CoT approach. The Llama model and the Gemma model prompted with five EEG channels show almost the opposite trend, with a weakly left-skewed p-value distribution, suggesting an absence of improvement over the baseline. The PaliGemma model shows again an intermediate pattern, with a weakly right-skewed p-value distribution, suggesting a more ambiguous behavior. We also evaluate the number of FLOPs necessary for running the Gemma single-channel (∼3T), Llama (∼3.7T), PaliGemma (∼1.6T), and Gemma five-channel (∼6.7T) model instances, obtaining values that stand several orders of magnitude above those measured for the classical machine learning and deep learning models. Comparing the performance of the Gemma model with and without fine-tuning, we observe that the difference is not statistically significant, whether the comparison is performed using the one-sided Wilcoxon signed-rank test (p = 0.42) or the McNemar test (p = 0.87). By contrast, the Gemma model prompted with a single EEG channel performs significantly better than the same model prompted with five EEG channels, whether the comparison is performed using the one-sided Wilcoxon signed-rank test (p = 0.018) or the McNemar test (p = 0.021).

FIGURE 3

Seven histograms display the frequency of p-values from 0 to 1 for different models: Gemma (a), Llama (b), Gemma CoT (c), Gemma without FT (d), Gemma with FT (e), PaliGemma (f), and Gemma 5-Channel (g). Each plot shows varied distributions, many with frequencies peaking at the left end, especially in Gemma and Gemma CoT, indicating lower p-values are more frequent.

Distribution of p-values for the Gemma (a), Llama (b), Gemma CoT (c), Gemma without fine-tuning (d), Gemma with fine-tuning (e), PaliGemma (f), and Gemma 5-channel (g) foundation model instances. The dashed lines indicate the threshold.

4.4 Regions of interest

The region-wise analyses reveal substantial variability in the performance of the models across the cortical regions of the Harvard-Oxford atlas. For both the classification and the regression tasks, the classical machine learning and deep learning models reach their best performance in ROIs such as the cuneal cortex, the posterior division of the inferior temporal gyrus, and the inferior division of the lateral occipital cortex. By contrast, the performance of the models is more limited in ROIs such as the insular cortex and Heschl’s gyrus. Focusing on the classification task and the RF, SVM, and MLP models, the spatial distribution of the accuracy reveals some interesting patterns, as shown in Figure 4. While the three models display distinct local differences, they also share some global similarities, such as a lower accuracy around the insular cortex, and a higher accuracy in some parts of the occipital cortex. The detailed results can be found in Supplementary Notebooks 26–27, and the statistics in Supplementary Tables G-J.

FIGURE 4

Nine heat maps showing brain regions in sagittal, coronal, and axial views across three models: RF (Random Forest), SVM (Support Vector Machine), and MLP (Multilayer Perceptron). Each view features varying shades of red and blue indicating different levels of accuracy, with similar patterns observed across models.

Distribution of accuracy values across cortical regions. The ROI accuracy values are displayed for the three best-performing classification models (RF, SVM, MLP) across the sagittal, coronal, and axial views. Blue indicates lower accuracy, and red indicates higher accuracy.

5 Discussion

5.1 First strategy

The first strategy, which consists in training classical machine learning and deep learning models, proves to be the most successful. Interestingly, the MLP model is one of the best-performing models for the classification task, and the only deep learning model achieving similar or higher accuracy than the classical machine learning models. The superiority of the MLP model over the CNN, RNN, and transformer models is not due to model complexity, as confirmed by the number of FLOPs. Rather, it suggests that the relevant patterns for EEG-to-fMRI prediction might be globally distributed across space (EEG channels) and time (fMRI scans), therefore favoring feedforward neural network architectures which do not rely too strongly on spatial or temporal priors. We might expect that this global distribution of relevant features could also favor text-only LLMs over LMMs, since the latter typically rely on data with stronger structure, such as images. The superiority of the MLP model is also consistent with a very recent preprint suggesting that simple neural networks for EEG-to-fMRI prediction can outperform more complex ones (Grover Roos et al., 2025).

The region-wise analyses are constrained by the objective of this research, which is to evaluate the two strategies for EEG-to-fMRI prediction in a relatively symmetrical manner. While the use of fMRI brain regions ā€œwith namesā€, such as the cortical regions of the Harvard-Oxford atlas, is necessary for leveraging LLMs and LMMs, it is unclear whether these ROIs would be optimal in the context of classical machine learning and deep learning models alone. For example, the relatively large ā€œOccipital Poleā€ region, observable at the far left of the sagittal view in Figure 4, displays limited accuracy for the RF and SVM models, whereas neighboring occipital areas show higher predictability. This suggests that the size of the ROIs, along with other variables, might have an impact on the quality of the prediction, possibly because larger regions tend to implement more diverse cognitive processes. We speculate that training the models on more specific brain regions, or even directly at the level of individual voxels, could potentially provide additional insights on the spatial variations in predictability, at the expense of a greater computational cost. Despite this general constraint, the patterns observed seem informative for evaluating the possibilities of EEG-to-fMRI prediction, while remaining consistent with our scientific knowledge on the human brain. In particular, the low accuracy observed around the insular cortex may reflect the difficulty of measuring the electrical activity of this deep cortical region using EEG, whereas the high accuracy observed in some parts of the occipital cortex may be explained by the strong EEG correlates of visual processes.

Intriguingly, although the majority of our neural network architectures achieve significant results in the classification task, none of them performs better than the baseline in the regression task, whereas the RF and SVM models do. This limitation of our neural networks should not be interpreted as evidence that deep learning models are generally unable to perform such regression task: although EEG and fMRI remain very different measures of the activity of the brain, recent studies have demonstrated the feasibility of the EEG-to-fMRI regression task when using more complex and specialized deep learning models, in particular in resting-state contexts (Li et al., 2024b). Rather, this limitation of our neural networks seems to indicate that our simple architectures are insufficient to extract meaningful information for the regression task, despite their success on the classification task. In other words, the complexity threshold for a minimally working neural network might be higher for the regression task, possibly because the fluctuations of EEG or fMRI data at multiple time scales prevent the models from precisely inferring the amplitude of the fMRI signal, while still allowing them to infer its direction. Remarkably, the RF and SVM models maintain a better performance, suggesting that these simple machine learning models might be less sensitive to noise and drift in this particular context. The extent to which EEG data can be used to predict the amplitude of fMRI data is an open question, and the level of precision required for real-world applications will most certainly depend on the intended usage. Nevertheless, the differences observed between the classification and regression tasks suggest that targeting the direction of the fMRI signal, rather than its amplitude, might provide an easier path forward for enhancing neural interfaces with EEG-to-fMRI prediction, particularly in the context of wearable devices with limited computing power.

5.2 Second strategy

The second strategy, which consists in directly leveraging the capabilities of pre-trained foundation models, shows promising results as well. Although the observed effects are much smaller than for the classical machine learning and deep learning models, the Gemma model prompted with a single EEG channel achieves statistically reliable gains over the baseline. Furthermore, our CoT approach demonstrates that the Gemma model can infer the cognitive functions associated with EEG data, and subsequently predict the fMRI data from these cognitive functions. This suggests the possibility that the performance of the Gemma model with direct prompting might be driven by a similar mechanism, with the model implicitly relying on cognitive functions as an intermediate reasoning step between EEG and fMRI, even in the absence of native CoT capabilities. The lower performance of the Llama and PaliGemma models might reflect their intrinsic limitation for this particular prediction task, or simply suggest that more exploration is needed to adapt the prompts and parameters for each foundation model. The absence of improvement observed after fine-tuning the Gemma model may seem intriguing, but while it is difficult to provide a definitive interpretation, a very recent study highlighted that the result of fine-tuning an LLM for a scientific problem depends both on the dataset and on the complexity of the question (Van Herc et al., 2025). This research reported a weak predictive power for complex input variables and a very low number of training data points, which is precisely our situation. Overcoming this difficulty, and establishing a more successful fine-tuning strategy, would certainly require further study.

Another result that may seem counterintuitive is the degradation of the performance of the Gemma model when prompted with five EEG channels instead of one. Classical machine learning and deep learning models would typically benefit from multichannel integration, as adding new EEG channels would allow these models to exploit several complementary sources of information, as well as the interactions between them. Such integration is particularly important in EEG research for decoding brain states and understanding cognitive functions. However, current LLMs are not specifically designed to process complex, structured numerical inputs with the same efficiency. As a result, while the five-channel prompts are substantially longer, requiring more than twice the FLOPs of the single-channel prompts, it is unclear whether this additional information could be reliably leveraged by the model for its task. In particular, the mechanisms by which an LLM would resolve conflicting EEG channels, or channel-specific noise in band power values, are not obvious. Rather, we speculate that the Gemma model might show a better performance in the single-channel case precisely because the limited, focused information allows this model to rely on simple heuristics, extracted from the neuroscience literature and related sources. Longer prompts featuring nested numerical data might degrade these heuristics, resulting in less accurate inferences. Still, whereas current LLMs may be limited in their capacity to integrate complex numerical inputs, future foundation models with more advanced reasoning capabilities might achieve a significantly better performance in such tasks, expanding their utility for EEG-to-fMRI prediction.

5.3 Limitations

The EEG-fMRI dataset on which this research is based is limited in size, and unbalanced in terms of sex and age. Given the limited training set, the classical machine learning and deep learning models are evaluated across sessions, but not across subjects. The exact data on which the foundation models have been pre-trained, and in particular the number of neuroscience articles and books, or other sources of knowledge on EEG and fMRI patterns, is not publicly available. Running the foundation models requires significantly more computational resources than running the classical machine learning and deep learning models, which forces us to focus on a subset of our fMRI scans, band powers, and brain regions of interest, and to use an ad hoc region-channel mapping. Also because of this computational cost, we only evaluate small foundation models, which may be less performant than larger ones. In general, this research focuses on demonstrating the feasibility of the two strategies for EEG-to-fMRI prediction, and does not aim to achieve the best performance possible.

5.4 Future directions

For the first strategy, a natural path forward would be to explore additional preprocessing pipelines and neural network architectures, and to experiment with more specific models, relying more strongly on our knowledge of the human brain. We would certainly benefit from larger EEG-fMRI datasets, and more balanced in terms of sex and age, if they become available in the future. Since the scarcity and heterogeneity of EEG-fMRI datasets is currently a significant obstacle, developing methods for integrating heterogeneous datasets (e.g., different EEG montages, fMRI scan durations, etc.) could also be highly valuable. For the second strategy, the possible future directions are relatively symmetrical. It would make sense to explore additional prompt engineering techniques and model parameters, and to experiment with foundation models specifically pre-trained on the neuroscience literature. We would certainly benefit from more advanced and general foundation models, which will likely become available in the future, in particular if these models have native CoT capabilities, potentially allowing them to reliably leverage multichannel information, and to explicitly use cognitive functions as an intermediate reasoning step between EEG and fMRI. Developing methods for successfully integrating the two strategies, for example by fine-tuning foundation models on existing EEG-fMRI datasets, could also be a promising path. Finally, both strategies could be extended beyond EEG and fMRI, to other neuroimaging modalities such as magnetoencephalography, bringing us a step closer to a multimodal foundation model for neuroscience.

6 Conclusion

In this research, we demonstrate the feasibility of predicting fMRI activity from EEG activity by following two distinct strategies: training classical machine learning and deep learning models on an EEG-fMRI dataset, or leveraging the capabilities of pre-trained foundation models. When this prediction objective is formulated as a classification task, the RF, SVM, and MLP models stand out as particularly effective, while the Gemma model prompted with a single EEG channel achieves statistically reliable gains over the baseline, whether it is used with direct prompting or with a CoT approach. The latter case demonstrates that the Gemma model can infer the cognitive functions associated with EEG data, and subsequently predict the fMRI data from these cognitive functions. Although the observed effects are much smaller for the foundation models than for the classical machine learning and deep learning models, the possibility to leverage LLMs and LMMs for this task, and potentially to integrate the two strategies by fine-tuning foundation models on existing EEG-fMRI datasets, could open new horizons for EEG-to-fMRI prediction. As more advanced and general LLMs and LMMs continue to be developed, these models could become increasingly important tools for enhancing neural interfaces and advancing toward a multimodal foundation model for neuroscience.

6.1 Reproducibility

The experiments in this research are based on the publicly available EEG-fMRI NF dataset A multi-modal human neuroimaging dataset for data integration: simultaneous EEG and fMRI acquisition during a motor imagery neurofeedback task: XP1, released under the CC0 license and accessible via OpenNeuro at https://openneuro.org/datasets/ds002336/versions/2.0.2. The Gemma, Llama, and PaliGemma models are also publicly available. The code is provided in fully commented Jupyter Notebooks (Supplementary Notebooks 1–27), designed to be run in a Conda environment specified by Supplementary ENV.yml. The fMRI preprocessing using fMRIPrep is documented in Supplementary README.md, and must be completed before running the experiments. On a standard personal computer (e.g., MacBook Pro), the fMRIPrep preprocessing takes around 24–36Ā h, while the full execution of the code requires an additional 8–12Ā h.

6.2 Societal impact

This research does not introduce a new deployable system or asset, and is not expected to have an immediate societal impact. However, by contributing to the long-term objective of EEG-to-fMRI prediction, it could eventually support the development of enhanced neural interfaces, designed to achieve near-fMRI precision while retaining the affordability and wearability of EEG devices. While such neural interfaces remain hypothetical, the possibility to monitor cognitive processes at scale could have a profound impact in many domains, and would naturally require careful reflection and evaluation, considering the sensitive nature of human brain data.

Statements

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://openneuro.org/datasets/ds002336/versions/2.0.2.

Ethics statement

Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and the institutional requirements.

Author contributions

MD: Writing – original draft, Writing – review and editing.

Funding

The authors declare that no financial support was received for the research and/or publication of this article.

Conflict of interest

The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that Generative AI was used in the creation of this manuscript. Generative AI was used at several stages of this research to: (1) support the literature review, (2) support code writing and debugging to a limited extent, (3) contribute to the establishment of an ad hoc region–channel mapping, as explicitly noted in the manuscript, and (4) improve the readability of the text through English correction. All final decisions regarding the manuscript content were performed by the author.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fsysb.2025.1715692/full#supplementary-material

References

  • 1

    Abadi M. Barham P. Chen J. Chen Z. Davis A. Dean J. et al (2016). ā€œTensorFlow: a system for large-scale machine learning,ā€ in 12th USENIX symposium on operating systems design and implementation (OSDI 16), 265–283.

  • 2

    Abreu R. Leal A. Figueiredo P. (2018). EEG-informed fMRI: a review of data analysis methods. Front. Hum. Neurosci.12 (29). 10.3389/fnhum.2018.00029

  • 3

    Alexander W. H. Brown J. W. (2011). Medial prefrontal cortex as an action-outcome predictor. Nat. Neurosci.14 (10), 1338–1344. 10.1038/nn.2921

  • 4

    Alexandre A. Pedregosa F. Eickenberg M. Gervais P. Mueller A. Kossaifi J. et al (2014). Machine learning for neuroimaging with Scikit-learn. Front. Neuroinformatics8 (14), 14. 10.3389/fninf.2014.00014

  • 5

    Badre D. Hoffman J. Cooney J. W. D’Esposito M. (2009). Hierarchical cognitive control deficits following damage to the human frontal lobe. Nat. Neurosci.12 (4), 515–522. 10.1038/nn.2277

  • 6

    Bansal R. Samanta B. Dalmia S. Gupta N. Vashishth S. Ganapathy S. et al (2024). LLM augmented LLMs: expanding capabilities through composition. arXiv Preprint arXiv:2401.02412.

  • 7

    Bommasani R. Hudson D. A. Adeli E. Altman R. Arora S. von Arx S. et al (2021). On the opportunities and risks of foundation models. arXiv Preprint arXiv:2108.07258.

  • 8

    Boorman E. D. Behrens T. E. J. Woolrich M. W. Rushworth M. F. S. (2009). How green is the grass on the other side? Frontopolar cortex and the evidence in favor of alternative courses of action. Neuron62 (5), 733–743. 10.1016/j.neuron.2009.05.014

  • 9

    Brett M. Markiewicz C. Hanke M. CƓtƩ M.-A. Cipollini B. Paul M. C. et al (2020). NiBabel: neuroimaging data access. Softw. Library.

  • 10

    Brohan A. Chebotar Y. Finn C. Hausman K. Herzog A. Ho D. et al (2023). ā€œDo as I can, not as I say: grounding language in robotic affordances,ā€ in Conference on robot learning (PMLR), 287–318.

  • 11

    Calhas D. Henriques R. (2023a). ā€œEEG to fMRI synthesis benefits from attentional graphs of electrode relationships,ā€ in Proceedings of the 8th Machine Learning for Healthcare Conference (PMLR) 219, 76–93.

  • 12

    Calhas D. Henriques R. (2023b). ā€œEEG to fMRI synthesis: is deep learning a candidate?,ā€ in Information systems development, organizational aspects and societal trends (ISD2023 proceedings) (Lisbon, Portugal: Instituto Superior TĆ©cnico).

  • 13

    Chen T. Guestrin C. (2016). ā€œXGBoost: a scalable tree boosting system,ā€ in Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 785–794.

  • 14

    Cho K. Van MerriĆ«nboer B. Gulcehre C. Bahdanau D. Bougares F. Schwenk H. et al (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv Preprint arXiv:1406.1078, 1724–1734. 10.3115/v1/d14-1179

  • 15

    Ciccarelli G. Federico G. Mele G. Di Cecca A. Migliaccio M. Rosario Ilardi C. et al (2023). Simultaneous real-time EEG-fMRI neurofeedback: a systematic review. Front. Hum. Neurosci.17, 1123014. 10.3389/fnhum.2023.1123014

  • 16

    Cui W. Jeong W. Thƶlke P. Medani T. Karim J. Joshi A. A. et al (2024). ā€œNeuro-GPT: towards a foundation model for EEG,ā€ in 2024 IEEE international symposium on biomedical imaging (ISBI) (IEEE), 1–5.

  • 17

    Cury C. Maurel P. Gribonval R. Barillot C. (2020). A sparse EEG-informed fMRI model for hybrid EEG-fMRI neurofeedback prediction. Front. Neurosci.13, 1451. 10.3389/fnins.2019.01451

  • 18

    Daw N. D. Gershman S. J. Seymour B. Dayan P. Dolan R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron69 (6), 1204–1215. 10.1016/j.neuron.2011.02.027

  • 19

    Deligianni F. Centeno M. Carmichael D. W. Clayden J. D. (2014). Relating resting-state fMRI and EEG whole-brain connectomes across frequency bands. Front. Neurosci.8, 258. 10.3389/fnins.2014.00258

  • 20

    Dettmers T. Pagnoni A. Holtzman A. Zettlemoyer L. (2023). QLoRA: efficient finetuning of quantized LLMs. Adv. Neural Inf. Process. Syst.36, 10088–10115.

  • 21

    Diederik P. K. Jimmy Ba (2014). Adam: a method for stochastic optimization. arXiv Preprint arXiv:1412.6980.

  • 22

    Donoso M. Collins A. G. E. Koechlin E. (2014). Foundations of human reasoning in the prefrontal cortex. Science344 (6191), 1481–1486. 10.1126/science.1252254

  • 23

    Esteban O. Markiewicz C. J. Blair R. W. Moodie C. A. Isik A. I. Erramuzpe A. et al (2019). fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat. Methods16 (1), 111–116. 10.1038/s41592-018-0235-4

  • 24

    Gary H. G. (1999). Deconvolution of impulse response in event-related BOLD fMRI. NeuroImage9 (4), 416–429. 10.1006/nimg.1998.0419

  • 25

    Gorgolewski K. J. Auer T. Calhoun V. D. Craddock R. C. Das S. Duff E. P. et al (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Sci. Data3 (1), 1–9. 10.1038/sdata.2016.44

  • 26

    Gottweis J. Natarajan V. (2025). Accelerating scientific breakthroughs with an AI co-scientist. Google Research Blog.

  • 27

    Gramfort A. Luessi M. Larson E. Engemann D. A. Strohmeier D. Brodbeck C. et al (2013). MEG and EEG data analysis with MNE-python. Front. Neuroinformatics7, 267. 10.3389/fnins.2013.00267

  • 28

    Grattafiori A. Dubey A. Jauhri A. Pandey A. Kadian A. Al-Dahle A. et al (2024). The Llama 3 herd of models. arXiv Preprint arXiv:2407.21783.

  • 29

    Griffin C. Wallace D. Mateos-Garcia J. Hanna S. Kohli P. (2024). A new golden age of discovery. Google DeepMind Essay.

  • 30

    Grover Roos K. Fukuda A. Cap Q. H. (2025). From brainwaves to brain scans: a robust neural network for EEG-to-fMRI synthesis. arXiv Preprint arXiv:2502.08025.

  • 31

    Hayes T. Rao R. Akin H. Sofroniew N. J. Oktay D. Lin Z. et al (2025). Simulating 500 million years of evolution with a language model. Science387 (6736), 850–858. 10.1126/science.ads0018

  • 32

    Hu E. J. Shen Y. Wallis P. Allen-Zhu Z. Li Y. Wang S. et al (2022). LoRA: low-rank adaptation of large language models. ICLR1 (2), 3.

  • 33

    Jason W. Wang X. Schuurmans D. Bosma M. Xia F. Chi Q. V. Le et al (2022). Chain-of-Thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst.35, 24824–24837.

  • 34

    Jiang W.-B. Zhao L.-M. Lu B.-L. (2024a). Large brain model for learning generic representations with tremendous EEG data in BCI. arXiv Preprint arXiv:2405.18765.

  • 35

    Jiang W.-B. Wang Y. Lu B.-L. Li D. (2024b). NeuroLM: a universal multi-task foundation model for bridging the gap between language and EEG signals. arXiv Preprint arXiv:2409.00101.

  • 36

    Kennedy D. N. Haselgrove C. Fischl B. Breeze J. L. Frazier J. A. Seidman L. J. et al (2003). Harvard-oxford cortical structural atlas. Distributed with FSL.

  • 37

    Keynan J. N. Meir-Hasson Y. Gilam G. Cohen A. Jackont G. Kinreich S. et al (2016). Limbic activity modulation guided by functional magnetic resonance imaging–inspired electroencephalography improves implicit emotion regulation. Biol. Psychiatry80 (6), 490–496. 10.1016/j.biopsych.2015.12.024

  • 38

    Koechlin E. Ody C. Kouneiher F. (2003). The architecture of cognitive control in the human prefrontal cortex. Science302 (5648), 1181–1185. 10.1126/science.1088545

  • 39

    Kolling N. Behrens T. E. J. Mars R. B. Rushworth M. F. S. (2012). Neural mechanisms of foraging. Science336 (6077), 95–98. 10.1126/science.1216930

  • 40

    Kostas D. Aroca-Ouellette S. Rudzicz F. (2021). BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data. Front. Hum. Neurosci.15, 653659. 10.3389/fnhum.2021.653659

  • 41

    Kovalev A. Mikheev I. Ossadtchi A. (2022). fMRI from EEG is only deep learning away: the use of interpretable DL to unravel EEG-fMRI relationships. arXiv Preprint arXiv:2211.02024.

  • 42

    Lanzino R. Fontana F. Cinque L. Scarcello F. Maki A. (2024). NT-ViT: neural transcoding vision transformers for EEG-to-fMRI synthesis. arXiv Preprint arXiv:2409.11836.

  • 43

    Lebreton M. Jorge S. Vincent M. Bertrand T. Pessiglione M. (2009). An automatic valuation system in the human brain: evidence from functional neuroimaging. Neuron64 (3), 431–439. 10.1016/j.neuron.2009.09.040

  • 44

    LeCun Y. Bottou L. Bengio Y. Haffner P. (1998). Gradient-based learning applied to document recognition. Proc. IEEE86 (11), 2278–2324. 10.1109/5.726791

  • 45

    Liu X. Sajda P. (2019). ā€œA convolutional neural network for transcoding simultaneously acquired EEG-fMRI data,ā€ in 2019 9th international IEEE/EMBS conference on neural engineering (NER) (IEEE), 477–482.

  • 46

    Li Y. Lou A. Xu Z. Wang S. Chang C. (2024a). Leveraging sinusoidal representation networks to predict fMRI signals from EEG. Med. Imaging 2024 Image Process, 12926, 795–800.10.1117/12.3007677

  • 47

    Li Y. Lou A. Xu Z. Zhang S. Wang S. Englot D. et al (2024b). NeuroBOLT: resting-state EEG-to-fMRI synthesis with multi-dimensional feature mapping. Adv. Neural Inf. Process. Syst.37, 23378–23405. 10.52202/079017-0736

  • 48

    Lioi G. Cury C. Perronnet L. Mano M. Bannier E. LƩcuyer A. et al (2020). Simultaneous EEG-fMRI during a neurofeedback task, a brain imaging dataset for multimodal data integration. Sci. Data7 (1), 173. 10.1038/s41597-020-0498-3

  • 49

    Loshchilov I. Hutter F. (2017). Decoupled weight decay regularization. arXiv Preprint arXiv:1711.05101.

  • 50

    Ma Y. Liu Y. Chen L. Zhu G. Chen B. Zheng N. (2025). BrainCLIP: brain representation via CLIP for generic natural visual stimulus decoding. IEEE Trans. Med. Imaging44, 3962–3972. 10.1109/TMI.2025.3537287

  • 51

    Meir-Hasson Y. Kinreich S. Podlipsky I. Hendler T. Intrator N. (2014). An EEG finger-print of fMRI deep regional activation. NeuroImage102, 128–141. 10.1016/j.neuroimage.2013.11.004

  • 52

    Ogawa S. Lee T.-M. Kay A. R. Tank D. W. (1990). Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proc. Natl. Acad. Sci.87 (24), 9868–9872. 10.1073/pnas.87.24.9868

  • 53

    Ogg M. Coon W. G. (2024). ā€œSelf-supervised transformer model training for a sleep-EEG foundation model,ā€ in 2024 46th annual international conference of the IEEE engineering in medicine and biology society (EMBC) (IEEE), 1–6.

  • 54

    Or-Borichev A. Gurevitch G. Klovatch I. Greental A. Lerner Y. Levy D. J. et al (2023). Neural and functional validation of fMRI-informed EEG model of right inferior frontal gyrus activity. NeuroImage266, 119822. 10.1016/j.neuroimage.2022.119822

  • 55

    Ortega Caro J. Fonseca A. H. de O. Averill C. Rizvi S. A. Rosati M. Cross J. L. et al (2023). BrainLM: a foundation model for brain activity recordings. bioRxiv.

  • 56

    O’Doherty J. Dayan P. Schultz J. Deichmann R. Friston K. Dolan R. J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science304 (5669), 452–454. 10.1126/science.1094285

  • 57

    Pedregosa F. Varoquaux G. Gramfort A. Vincent M. Bertrand T. Grisel O. et al (2011). Scikit-learn: machine learning in python. J. Mach. Learn. Res.12, 2825–2830.

  • 58

    Pessiglione M. Schmidt L. Draganski B. Kalisch R. Lau H. Dolan R. J. et al (2007). How the brain translates money into force: a neuroimaging study of subliminal motivation. Science316 (5826), 904–906. 10.1126/science.1140459

  • 59

    Plassmann H. O’Doherty J. Rangel A. (2007). Orbitofrontal cortex encodes willingness to pay in everyday economic transactions. J. Neurosci.27 (37), 9984–9988. 10.1523/JNEUROSCI.2131-07.2007

  • 60

    Poldrack R. A. Barch D. M. Mitchell J. P. Wager T. D. Wagner A. D. Devlin J. T. et al (2013). Toward open sharing of task-based fMRI data: the OpenfMRI project. Front. Neuroinformatics7 (12), 12. 10.3389/fninf.2013.00012

  • 61

    Riviere M. Pathak S. Giuseppe Sessa P. Hardin C. Bhupatiraju S. Hussenot L. et al (2024). Gemma 2: improving open language models at a practical size. arXiv Preprint arXiv:2408.00118.

  • 62

    Sato J. R. Rondinoni C. Sturzbecher M. de Araujo D. B. Amaro E. Jr (2010). From EEG to BOLD: brain mapping and estimating transfer functions in simultaneous EEG-fMRI acquisitions. NeuroImage50 (4), 1416–1426. 10.1016/j.neuroimage.2010.01.075

  • 63

    Semenkov I. Rudych P. Ossadtchi A. (2024). Beyond the surface: revealing the depths of brain activity by predicting fMRI from EEG with deep learning. bioRxiv.

  • 64

    Shi C. Wang Y. Wu Y. Chen S. Hu R. Zhang M. et al (2023). Self-supervised pretraining improves the performance of classification of task functional magnetic resonance imaging. Front. Neurosci.17, 1199312. 10.3389/fnins.2023.1199312

  • 65

    Simoes M. Abreu R. Direito B. Sayal A. Castelhano J. Carvalho P. et al (2020). How much of the BOLD-fMRI signal can be approximated from simultaneous EEG data: relevance for the transfer and dissemination of neurofeedback interventions. J. Neural Eng.17 (4), 046007. 10.1088/1741-2552/ab9a98

  • 66

    Singer N. Poker G. Dunsky-Moran N. Nemni S. Balter S. R. Doron M. et al (2023). Development and validation of an fMRI-informed EEG model of reward-related ventral striatum activation. NeuroImage276, 120183. 10.1016/j.neuroimage.2023.120183

  • 67

    Steiner A. Pinto A. S. Tschannen M. Keysers D. Wang X. Bitton Y. et al (2024). PaliGemma 2: a family of versatile VLMs for transfer. arXiv Preprint arXiv:2412.03555.

  • 68

    Vallat R. Walker M. P. (2021). An open-source, high-performance tool for automated sleep staging. eLife10, e70092. 10.7554/eLife.70092

  • 69

    Van Herck J. Gil M. V. Jablonka K. M. Abrudan A. Anker A. S. Asgari M. et al (2025). Assessment of fine-tuned large language models for real-world chemistry and material science applications. Chem. Sci.16 (2), 670–684. 10.1039/d4sc04401k

  • 70

    Vaswani A. Shazeer N. Parmar N. Uszkoreit J. Jones L. Gomez A. N. et al (2017). Attention is all you need. Adv. Neural Inf. Process. Syst.30.

  • 71

    Virtanen P. Gommers R. Oliphant T. E. Haberland M. Tyler R. Cournapeau D. et al (2020). SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods17 (3), 261–272. 10.1038/s41592-019-0686-2

  • 72

    Wang R. Chen Z. S. (2024). Large-scale foundation models and generative AI for BigData neuroscience. Neurosci. Res.215, 3–14. 10.1016/j.neures.2024.06.003

  • 73

    Warbrick T. (2022). Simultaneous EEG-fMRI: what have we learned and what does the future hold?Sensors22 (6), 2262. 10.3390/s22062262

  • 74

    Wolf T. Debut L. Sanh V. Chaumond J. Delangue C. Moi A. et al (2020). ā€œTransformers: State-of-the-art natural language processing,ā€ in Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, 38–45.

Summary

Keywords

EEG-to-fMRI prediction, EEG, fMRI, foundation model, neuroimaging, neurofeedback, LLM, chain of thought

Citation

Donoso M (2025) Neural networks and foundation models: two strategies for EEG-to-fMRI prediction. Front. Syst. Biol. 5:1715692. doi: 10.3389/fsysb.2025.1715692

Received

29 September 2025

Revised

16 November 2025

Accepted

24 November 2025

Published

17 December 2025

Volume

5 - 2025

Edited by

Robert Andrew McDougal, Yale University, United States

Reviewed by

Ali H. Rafati, Aarhus University, Denmark

Linze Qian, Zhejiang University, China

Updates

Copyright

*Correspondence: Maƫl Donoso,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics