Mapping football tactical behavior and collective dynamics with artificial intelligence: a systematic review

Teixeira, José E.; Maio, Eduardo; Afonso, Pedro; Encarnação, Samuel; Machado, Guilherme F.; Morgans, Ryland; Barbosa, Tiago M.; Monteiro, António M.; Forte, Pedro; Ferraz, Ricardo; Branquinho, Luís

doi:10.3389/fspor.2025.1569155

REVIEW article

Front. Sports Act. Living, 30 May 2025

Sec. Sports Science, Technology and Engineering

Volume 7 - 2025 | https://doi.org/10.3389/fspor.2025.1569155

This article is part of the Research TopicHarnessing Artificial Intelligence in Sports Science: Enhancing Performance, Health, and EducationView all 12 articles

Mapping football tactical behavior and collective dynamics with artificial intelligence: a systematic review

José E. Teixeira^1,2,3,4,5,6

Eduardo Maio^7,8,9

Pedro Afonso^7,8

Samuel Encarnação^5,10,11

Guilherme F. Machado^12,13

Ryland Morgans¹⁴

Tiago M. Barbosa^5,10

António M. Monteiro^5,10

Pedro Forte^5,6,10

Ricardo Ferraz^4,9

Luís Branquinho^4,7,6,8*

¹Department of Sports Sciences, Polytechnic of Guarda, Guarda, Portugal
²Department of Sports Sciences, Polytechnic of Cávado and Ave, Guimarães, Portugal
³SPRINT—Sport Physical Activity and Health Research & Innovation Center, Guarda, Portugal
⁴Research Center in Sports, Health and Human Development, Covilhã, Portugal
⁵Research Center for Active Living and Wellbeing (LiveWell), Polytechnic Institute of Bragança, Bragança, Portugal
⁶CI-ISCE, Instituto Superior de Ciências Educativas do Douro (ISCE Douro), Penafiel, Portugal
⁷Biosciences Higher School of Elvas, Polytechnic Institute of Portalegre, Portalegre, Portugal
⁸Life Quality Research Center (LQRC-CIEQV), Santarém, Portugal
⁹Department of Sport Sciences, University of Beira Interior, Covilhã, Portugal
¹⁰Department of Sports Sciences, Polytechnic Institute of Bragança, Bragança, Portugal
¹¹Department of Physical Education, Sport and Human Movement, Universidad Autónoma de Madrid (UAM), Ciudad Universitaria de Cantoblanco, Madrid, Spain
¹²Centre of Research and Studies in Soccer (NUPEF), Universidade Federal de Viçosa, Viçosa, Brazil
¹³Scientific Department and Department of Athletes' Integration and Development, Paulista Football Federation (FPF), São Paulo, Brazil
¹⁴School of Sport and Health Sciences, Cardiff Metropolitan University, Cardiff, United Kingdom

Football, as a dynamic and complex sport, demands an understanding of tactical behaviors to excel in training and competition. Artificial intelligence (AI) has revolutionized the tactical performance analysis in football, offering unprecedented data analytics insights for players, coaches, and analysts. This systematic review aims to examine and map out the current state of research on AI-based tactical behavior, collective dynamics, and movement patterns in football. A total of 2,548 articles were identified following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines and the Population-Intervention-Comparators-Outcomes framework. By synthesizing findings from 32 studies, this review elucidates the available AI-based techniques to analyze tactical behavior and identify the collective dynamic based on artificial neural networks, deep learning, machine learning, and time-series techniques. Concretely, the tactical behavior was expressed by spatiotemporal tracking data using convolutional neural networks, recurrent neural networks, variational recurrent neural networks, and variational autoencoders, Delaunay method, player rank, hierarchical clustering, logistic regression, XGBoost, random forest classifier, repeated incremental pruning produce error reduction, principal component analysis, and T-distributed stochastic neighbor embedding. Furthermore, collective dynamics and patterns were mapped by graph metrics such as betweenness centrality, eccentricity, efficiency, vulnerability, clustering coefficient, and page rank, expected possession value, pitch control map classifier, computer vision techniques, expected goals, 3D ball trajectories, dangerousity assessment, pass probability model, and total passes attempted. The performance of technical-tactical key indicators was expressed by team possession, team formation, team strategy, team-space control efficiency, determining team formations, coordination patterns, analyzing player interactions, ball trajectories, and pass effectiveness. In conclusion, the AI-based models can effectively reshape the landscape of spatiotemporal tracking data into training and practice routines with real-time decision-making support, performance prediction, match management, tactical-strategic thinking, and training task design. Nevertheless, there are still challenges for the real practical application of AI-based techniques, as well as ethical regulation and the formation of professional profiles that combine sports science, data analytics, computer science, and coaching expertise.

1 Introduction

Football has been described as a complex, dynamic, and non-linear system, in which the confrontation of two teams depends on constant adaptation to technical-tactical actions, situational factors, and ever-changing game situations (1, 2). A football team's performance and success depend on a deep comprehension of collective behavior, encompassing everything from individual player movements to the interdependence between game model, strategy, and opposing systems (3, 4). However, the practical operationalization of all the dimensions that influence tactical behavior and patterns has led to the development of complex and time-consuming methodologies, highly dependent on experience and susceptible to human error (3, 5). Although the automation of information collection systems such as tracking systems based on global position system (GPS) or Global Navigation Satellite Systems, local position measure (LPM), or video-based motion analysis (VBMA) has been already a widespread procedure in technical teams (6–8), the quantification of this information, the visualization datasets, and the dynamics of the work teams have undergone some transformation in recent years (9, 10).

In recent years, the integration of artificial intelligence (AI) techniques has revolutionized the analysis of tactical behaviors in football, offering unparalleled insights and opportunities for enhancement across various facets of the game. Data science and data analytics departments have been springing up in football clubs, exploring data analysis routines and procedures that normally applied IA techniques such as artificial neural networks (ANNs), deep learning (DL), and machine learning (ML) (2, 11). All these procedures require advanced computing environments and can be developed using supervised or unsupervised trainable algorithms (12, 13). Typically, this type of analysis is based on two datasets: spatiotemporal (14, 15) and key performance indicators (KPIs) (16, 17). On the one hand, spatiotemporal data are based on the time-series analysis (TS) raw data (x, y, z) of the individual and collective positions that the tracking systems provide. On the other hand, KPIs are based on notational and observational analysis, major areas of match analysis 1.0 and 2.0, which allow performance to be assessed in individual and collective actions (3, 18). All these datasets have been gaining ground in an integrative view of all football performance dimensions, especially physical, physiological (9, 19), and technical-tactical factors (20–22). The integrative view of the data allows us to better understand the preponderant factors in the interdependence of the match-related factors, intra- and intercoordination team formation, playing style, or tactical-strategic nuances (4, 21, 23).

The integration of AI in football analysis has ushered in a new era of understanding, enabling players, coaches, and performance analysts to glean deeper insights into big data (11, 13). With advancements in technology, researchers and practitioners can now decipher patterns and trends that were previously inaccessible, thereby informing accurately the decision-making processes during training and competition (11, 13, 24). Moreover, AI-based tactical behavior mapping holds immense promise in enhancing player development, refining coaching strategies, and elevating the overall standard of match analysis (3, 5). However, a comprehensive overview of football analytics remains to be established in the literature, which would allow for an in-depth examination of the intersection between AI and tactical behavior mapping (11, 13, 24). Football matches can be evaluated physiologically, technically, and tactically in a dynamic manner (match analysis level 3.0–4.0) with the use of spatiotemporal data (25).

Through a comprehensive evaluation of existing literature, this systematic review endeavors to identify the strengths and limitations of current approaches, while also illuminating avenues for future research and technological advancements in this burgeoning field (4, 26). By synthesizing disparate findings and insights, this review aims to provide a comprehensive understanding of the AI's role in augmenting tactical awareness and optimizing performance in football (11, 13, 24). Thus, this systematic review aims to examine and map out the current state of research on AI-based tactical behavior, collective dynamics, and movement patterns in football. By synthesizing findings from a diverse range of studies, this review seeks to shed light on the methodologies, technologies, and outcomes employed in the analysis of tactical behaviors within the realm of football. Specifically, it explores the utilization of neural networks, ML algorithms, computer vision techniques, and data analytics frameworks for extracting actionable insights from player movements, team formations, and strategic decision-making processes.

2 Materials and methods

2.1 Literature search strategy

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines and the Population-Intervention-Comparators-Outcomes (PICOS) design were followed to conduct this systematic review (27, 28). The literature search was based on seven academic databases and digital libraries: Web of Science (WoS, including all Web of Science Core Collection: Citation Indexes), PubMed/Medline, Science Direct (SCOPUS), SportDiscus, ACM Digital Library, IEEE Xplore Digital Library, and arXiv.org (e-Print archive). The eligibility criteria were established following the PICOS approach, and the search strategy was defined as follows: (1) Population: adult and youth football players (≥14 years old); (2) Intervention: AI-based analysis of offensive and defensive football patterns; (3) Comparison: AI techniques (ML, DL, neural networks); (4) Outcomes: tactical behavior and collective dynamics; (5) Study design: experimental and quasi-experimental designs. In accordance with the search strategy, studies from January 2000 to April 2025 were included for relevant publications using keywords presented in Table 1. In addition, the study variables are a Boolean search phrase (Table 1).

Table 1

Table 1. Search terms and following keywords for screening procedures.

The literature search was accessed between January and March 2025. The search strategy was independently conducted by one review author and checked by a second author. Discrepancies between the authors in the study selection were solved with support of a third reviewer. The authors did not prioritize authors or journals.

2.2 Selection criteria

The included studies in the present review followed the subsequent inclusion criteria: (1) Studies applying AI algorithms and techniques to analyze behavior and tactical patterns in football from both sexes of adult, youth competitions; (2) studies with screening procedures based on ANN, DL, ML, TS, and technical-tactical KPI; (3) only studies that included the AI-based method to express tactical analysis (i.e., team formation, style, patterns, networks); (4) observational prospective cohort, case-control, and/or cross sectorial design study including with at least 1-game datasets; (5) studies of human physical and physiological performance in Sport Science and as scope; (6) original article published in a peer-review journal; (7) full text available in English; and (8) article reported sample and screening procedures (e.g., data collection, study design, instruments, and outcomes).

The exclusion criteria were as follows: (1) Studies applying AI algorithms and techniques to other key outcomes in football (e.g., predict match outcome, player selection, injury prevention, isolated physical/physiological performance); (2) studies without screening procedures based on specifically ANN, DL, ML, TS, and KPI; (3) applying spatiotemporal data or KPI without AI-based analysis to map out tactical behavior, collective dynamics and movement patterns; (4) AI-based studies in other football codes (i.e., Rugby, Australian Football, Gaelic Football, and Beach Football, Futsal); (5) other research areas and non-human participants; (6) articles with bad quality in the description of study sample and screening procedures (e.g., data collection, study design, instruments, and the measures) according to Downs and Black scale; (7) reviews, abstract/papers conference, surveys, opinion pieces, commentaries, books, periodicals, editorials, case studies, non-peer-reviewed text, masters, and/or doctoral thesis.

Up until April 2025, only original articles published online could be found with the search. First, titles and abstracts were chosen and rejected based on the predetermined criteria. The selection process used to establish the final status—inclusion or exclusion—was applied to full-text articles. Arguments were settled by dialogue between the two authors, or, if needed, by a third researcher. There are now additional pertinent secondary sources that went through the same screening processes.

2.3 Quality assessment

Following the PRISMA statement, a systematic search of relevant English-language articles was performed between 2000 and 2025 (27, 28). The methodological quality of the studies was evaluated using the modified Downs and Black Quality Index (29), comprising 14 items. Higher scores indicated higher-quality studies, with scores above 0.6 considered indicative of a superior methodological quality. The quality statement (QS) conducted with an interobserver reliability analysis was conducted afterward, and each author carried out the classification on their own (Kappa index: 0.96).

2.4 Study coding and data extraction

Data extractions from the included articles were performed according to the following summary measures (1): (i) AI category; (ii) measures; (iii) formulas; (iv) description; (v) supervision; (vi) training algorithm; (v) accuracy; (vi) reference (Table 2); (2) sample characteristics were described according to: (i) reference; (ii) dataset; (iii) season competition; (iv) sample (n); (v) sampling; (vi) platform; (vii) publisher; (viii) quality statement (Table 3); (3) main findings: (i) reference; (ii) dataset; (iii) season competition; (iv) sample (n); (v) sampling; (vi) platform; (vii) publisher; (viii) QS score (Table 4). The research hot topics refers were determined to cover the frequently occurring keywords identified through a bibliometric analysis using VOSviewer software (30), which clustered key AI-based themes from the included studies (Figure 1).

Table 2

Table 2. Summary of measure, formula, description, and AI-based procedures of the reviewed articles.

Table 3

Table 3. Sample characteristics and testing methodologies of the included studies.

Table 4

Table 4. Key outcomes of the reviewed articles according to study purpose, tactical analysis, AI methods, data visualization, and data processing.

Figure 1

Figure 1. Most recurrent keywords in WoS and sportDiscuss collections. Different years, which facilitates temporal analysis. Hot research topics are represented by different colors in the graph of the average year of publication by the VOSviewer software with bibliometric occurrence map based on reference management files in specific API requests and search queries from WoS and SportDiscuss, Science Direct, PubMed; IEEE and ACM and arXiv. ACM, ACM Digital Library; AI, Artificial Intelligence; arXiv, arXiv.org (e-Print archive); IEEE, IEEE Transactions on Knowledge and Data Engineering; WoS, Web of Science (Core Collection: Citation Indexes).

This information represents a diverse array of techniques and metrics used in AI fields, specifically ANN, DL, ML, TS, and KPIs. The ANN methods included convolutional neural networks (CNNs), long short-term memory (LSTM), recurrent neural networks (RNNs), variational recurrent neural networks (VRNNs), and variational autoencoders (VAEs). In addition, they encompassed methodologies like the Delaunay method, player rank, hierarchical clustering, logistic regression (LR), XGBoost, random forest (RF) Classifier, and repeated incremental pruning produce error reduction (RIPPER), and dimensionality reduction techniques, such as principal component analysis (PCA) and T-distributed stochastic neighbor embedding (t-SNE). Furthermore, they cover graph theory metrics such as Betweenness centrality, eccentricity, efficiency, vulnerability, clustering coefficient, and page rank, along with performance indicators like expected possession value (EPV) and pitch control map classifier and various others, including computer vision techniques, expected goals (xG), 3D ball trajectories, dangerousity assessment (DA), pass probability model (PPM), and total passes attempted (TPA).

Table 2 displays the measure, formula, description, and AI-based procedures of the reviewed articles. Figure 1 expresses the clustered research hot topic using AI to map tactical behaviors, collective behavior, and movement patterns in football.

3 Results

3.1 Search findings

A total of 2,548 titles were collected on three academic databases (WoS = 146; Pub-Med = 375, ScienceDirect = 501, and SportDiscus = 325) and digital libraries (ACM = 230; IEEE = 425; arXiv = 546). After applying the selection criteria, 114 full-text articles were screened for eligibility, and 32 articles were retained for a final review. Figure 2 shows an s-PRISMA flow diagram depicting the screening procedures and search results.

Figure 2

Figure 2. PRISMA flowchart with search results. ACM, ACM Digital Library; AI, Artificial Intelligence; arXiv, arXiv.org (e-Print archive); IEEE, IEEE Transactions on Knowledge and Data Engineering; WoS, Web of Science (Core Collection: Citation Indexes).

3.2 Participant characteristics

Table 3 shows the participants’ characteristics of the reviewed studies. Spatiotemporal data (n = 20) and technical-tactical KPI (n = 7) were the datasets of the AI-based analyses present in the studies. Also, three studies use both datasets concomitantly. The seasons analyzed the period ranging from 2006 to 2007 and then from 2007 to 2021–2022, which means that 16 years were analyzed. English Premier League (EPL) was the most representative league using AI-based analysis (n = 9). Bundesliga and Eredivise were represented by two studies each. Other leagues with a single-reviewed study include La Liga, Serie A, Women's Champions League, FIFA World Cup (WC), and Brazilian Serie A (n = 6). Three studies analyzed datasets from 5 to 18 leagues in professional contexts and were not based exclusively on one team and/or league. Five studies did not describe the competition level or sporting season of their samples. The sample sizes across the included studies varied, ranging from as few as 2,932 passes to as high as 400,000 actions. Various sampling rates included 10, 25, and 30 Hz; however, 16 studies did not report sampling frequencies from tracking systems. Multiple tracking data, VBMA, and KPI platforms were utilized for data collection and analysis, among which are SPADL (n = 1), STATS LLC (n = 3), not described (ND) (n = 3), Prozone (n = 3), VBMA (n = 2), InStat API (n = 1), Opta Sports (n = 3), Statsbomb (n = 1), SportsCode (n = 1), TRACAB (n = 2), GPS data (n = 2), Japan League (J1) player tracking data (n = 1), Metrica Sports (n = 1), Whyscout (n = 1), EPL player tracking data (n = 1), STATS SportVU (n = 1), DVideo (n = 1), and FIFA player tracking data (n = 1). Various publishers included ACM (n = 4), ASA (n = 4), BigData (n = 1), IJSSC (n = 2), IEEE (n = 3), LNAI (n = 1), PlosOne (n = 1), Sci Med Footb (n = 1), Scientific Reports (n = 2), SDU (n = 1), Springer (n = 3), arXiv (n = 2), ND (n = 3), and Taylor and Francis (n = 1). QS scores ranged from 0.65 to 0.92, indicating a moderate to high methodological quality among the included studies.

3.3 Quality assessment

The quality assessment (QA) scores in the dataset ranged from a minimum of 0.65 to a maximum of 0.92. The mean score, calculated by summing all values and dividing by the total number of studies, was approximately 0.798, indicating an overall moderate to high methodological quality among the included studies.

3.4 Data extraction

Table 4 presents a summary of football data analysis studies, outlining the methods, tactical insights, and key outcomes. Sixteen studies utilized ML techniques for technical-tactical tasks such as pass evaluation, team formation analysis, player performance evaluation, space-control efficiency quantification, and predicting defensive success. Nine studies on the analysis of player and ball tracking data to derive insights into various aspects of football, including shot efficiency, team strategy, defensive behaviors, and pass effectiveness. Three studies mentioned data visualization techniques such as real-time quantification, plotting offensive and defensive attack plots, and visualizing player performance on different pitch positions. Ten studies mentioned employed advanced techniques like classic ANN, DL such as CNN, LSTM networks, and RNN such as accurate training feedback, player TS analysis, modeling players’ interactions, and predicting offensive plays. Among these are each with its own unique set of measures, formulas, and training algorithms. For instance, the calculation of Eigenvalues at specific positions in CNNs involves intricate formulas that account for convolution kernels and input feature maps, while training often relies on unsupervised learning approaches, yielding impressive accuracies ranging from 88% to 94%. Four studies focused on evaluating player and team performance using data-driven approaches, role-aware evaluation, estimating risk and reward dimensions of passes, and multidimensional evaluation of player performance. Tactical analysis was a common theme in all included studies, including the evaluation of passes, quantifying team space-control efficiency, determining team formations, coordination transition patterns, and analyzing player interactions. The analyses would vary from a micro (individual), meso (group), and macro (sector or collective) level.

4 Discussion

This systematic review examines and maps out the current state of research on AI-based tactical behavior and collective dynamics in football. The reviewed research analytics have employed various AI techniques to delve into the intricacies of football performance at both individual and team levels, concretely ANN, DL, ML, KPI, and TS techniques. Concretely, the AI algorithms reviewed were the CNN, RNN, VRNN, LSTM, and VAE. They also include techniques such as XGBoost, RF Classifier, PlayerRank, hierarchical clustering, LR, Delaunay method, RIPPER, and dimensionality reduction techniques (PCA, t-SNE). Furthermore, they cover graph theory metrics such as betweenness centrality, eccentricity, efficiency, vulnerability, clustering coefficient, and page rank, along with performance indicators like EPV and pitch control map classifier and various others, including computer vision techniques, xG, 3D ball trajectories, DA, PPM, and TPA. The technical-tactical KPI was expressed by team possession, team formation, team strategy, team space-control, team formations, coordination patterns, analyzing player interactions, ball trajectories, and pass effectiveness.

In the realm of performance tactical analysis utilizing AI-based algorithms, passing style becomes a defining descriptor, dictating the rhythm and flow of the game. Team formation serves as the canvas upon which strategies are crafted. Each team exhibits a unique style, manifested through spatial movement patterns and the nuanced behaviors of both individual players and the collective dynamics. Within this tapestry of play, goal-scoring patterns reveal themselves, defensive behaviors take shape, and in-game behaviors offer valuable insights. Pass effectiveness becomes a crucial metric, measuring a team's tactical performance and influencing the ultimate match outcome. Space-control efficiency transforms into a battleground, where teams compete for dominance over playing space, leveraging data-driven insights to evaluate performance and rank players accordingly. Amid the game's dynamic nature, the risk and reward of passes are constantly assessed, providing accurate training feedback and informing strategic decisions. Player TS analysis offers a deeper understanding of individual performance, while the pursuit of goal-scoring chances drives comprehensive team performance analysis. In this ever-evolving and non-linear dynamic landscape, team formations shift playing positions, with each movement influencing the trajectory of the game. Also, the trajectories of 3D balls carve through space, revealing the dynamism of team performance and the classified types of ball possession and control. Shot efficiency becomes a cornerstone of team strategy as players navigate the complex interplay between technical skill and tactical opportunity. Ultimately, this tactical dynamic transcends the boundaries of the game, offering a glimpse into the intricate world of sports analytics, where spatiotemporal data and data analytics converge to unlock the secrets of victory.

In fact, football's data, research, practitioners, and analysts have been delving deep into the intricate dynamics of the game, seeking to unveil patterns, styles, and strategies that underlie the sport's essence. Among these endeavors, Clijmans et al. (31) undertook a meticulous examination of an offensive playing style, recognizing its paramount importance in match preparation and scouting endeavors. Their work delved into the realm of sequential patterns of a team's style and offensive style, employing a discrete-time Markov chain (DTMC) model to generalize the past behaviors of teams. This model aimed to extract styles less influenced by the rarity of shots and goals, thereby capturing both the positional and the sequential dimensions of a team's style. In addition, it allowed for the evaluation of style efficiency and similarities with other teams, enriching the understanding of football tactics. In addition, Chawla et al. (32), pioneered the automation of pass evaluation in football matches, leveraging trajectory data and computational geometry. Through the application of ML techniques, particularly a player motion model, this model achieved a remarkable 90.2% accuracy in pass rating. Their methodology, rooted in complex data structures derived from computational geometry, paved the way for a more nuanced understanding of passing dynamics within the game. Meanwhile, Cho et al. (33) ventured into the realm of deep learning techniques to analyze player pass styles with heightened precision. This innovative approach utilized passing style. The descriptor, utilizing a convolutional autoencoder under the moniker Pass2vec, aimed to characterize player styles with enhanced accuracy. By doing so, the researchers envisioned facilitating a better understanding of passing dynamics, thereby potentially revolutionizing player training and recruitment strategies.

In a complementary effort, Bialkowski et al. (34) aimed to identify a team's tactical “signature” by analyzing spatiotemporal player tracking data, employing ML techniques focused on the detection of collective positioning patterns. Leveraging unsupervised ML techniques such as K-means clustering, this study devised an approach that significantly outperformed conventional match descriptors in characterizing team behavior. Their work, focusing on TS analysis and predictive modeling, illuminated the distinctive styles and strategies adopted by different teams, thereby enriching the understanding of dynamic coordination. Another study by Bialkowski et al. introduced an unsupervised method aimed at learning formation templates from spatiotemporal tracking data in football. Their approach, rooted in ML principles, enabled large-scale team analysis by providing insights into team formations and dynamics (35). By aligning spatiotemporal tracking data at the frame level to identify team collective structures and patterns, their methodology contributed to a deeper understanding of the strategic nuances inherent in playing style. Beernaerts et al. (36) focused on spatial movement patterns utilizing a multilayer ANN to analyze individual tactical performances across different playing positions. This approach introduced a qualitative trajectory calculus, known as QTC, to recognize these tactical patterns, offering a nuanced understanding of player dynamics on the field. Shen et al. (37) proposed a CNN-based method aimed at providing accurate training feedback in women's football teams. By employing CNN architecture, they developed real-time analysis tools for coaches, enhancing evaluation precision and facilitating quicker strategy formulation. García-Aliaga et al. (38) delved into determining on-field playing positions based on technical-tactical behavior using ML algorithms, enriching the understanding of player roles and game patterns. In their 2021 study, Goes et al. (39) developed an ML model to assess pass effectiveness in football by analyzing spatiotemporal tracking data, with a particular emphasis on disrupting opposing defenses.

Through the application of ML techniques, the reviewed studies devised novel measures for evaluating pass effectiveness, shedding light on tactical performance and strategic decision-making in football matches (39–42). Otherwise, another project by Goes et al. (42) assessed tactical performance by abstracting spatiotemporal features from general offensive principles of play. Utilizing position tracking data, they employed classifiers such as DT, GB, LDA, and QDA to provide valuable feedback to coaches regarding team execution and overall tactical performance, thereby contributing to match outcome prediction (39, 42). Gu et al. (43) contributed to the field of football analytics by quantifying team space-control efficiency during in-game possession. By employing models like ANN, CNN, and LSTM, they measured space-control effectiveness, enhancing the understanding of team dynamics and strategic decision-making on the field. Thus, it is possible to apply DL models to quantify team space-control efficiency, emphasizing dynamic territorial dominance metrics, although both studies address representations of collective behavior. Meanwhile, Gudmundsson and Wolle (40) analyzed player movement patterns, employing clustering techniques to uncover the most common spatial and temporal formations that emerge during a football match. By examining players and the movement of the ball between defensive and offensive zones, they provided valuable insights into team strategies and tactical implementations for both training and competition (39, 40). In a complementary effort, Leo et al. (44) pioneered the development of a multiview system capable of understanding real-time interactions between the ball and players, utilizing 3D ball trajectories to accurately identify moments of player engagement. Tested on data from Italian first division football championship, their system demonstrated promising potential for automated event identification, particularly in complex scenarios such as offside violations. Link et al. (41) focused on developing models for detecting individual and team ball possession using position data, providing real-time quantification and insights into match dynamics. Their automated event detection systems, based on Bundesliga data, enriched the understanding of possession-based strategies and tactical implementations. Lucey et al. (45) proposed a method for estimating score chances in football by leveraging strategic features from player and ball tracking data. Using LR and conditional random field models, they analyzed spatiotemporal patterns preceding shots, thereby quantifying shot efficiency and providing data-driven insights into team strategies.

Lastly, Kim et al. (46) contributed to real-time multiview analysis by developing a system capable of understanding interactions between the ball and the players. By focusing on 3D ball trajectories and employing innovative analysis techniques, their model held significant promise for the development of automated systems for event identification, potentially revolutionizing match analysis and decision-making processes. Gu et al. (43) quantified team space-control efficiency during possession using ML, employing advanced models like CNN and LSTM to enhance predictive capabilities. Kusmakar et al. (47) quantified team performance through player interactions leading to goal attempts, revealing pattern dynamics through possession chain data analysis. Pappalardo et al. (48) designed a data-driven framework for evaluating football players’ performance comprehensively, aiding scouts in player assessment and recommendation. Lastly, Shokrollahi et al. (49) extracted player position TS data to model team tactics and predict match outcomes, employing a hybrid approach of fuzzy logic and deep CNNs for multivariate analysis. Collectively, these studies showcase the diverse applications of ML in dissecting football performance, from individual player actions to team strategies and outcomes. Brooks et al. (50) presented two methods for analyzing pass event data in football, demonstrating their effectiveness through application to the 2012–2013 La Liga season. They showed that teams can be distinguished by their passing styles based on where they attempt passes on the pitch, achieving an 87% accuracy in a team classification task using pass location heatmaps. In addition, they investigated the use of pass locations during possessions to predict shots. Furthermore, they used the weights of the predictive model to rank players by the value of their passes. Decroos et al. (51) addressed the challenge of analyzing playing styles in football, proposing SoccerMix, a soft clustering technique based on mixture models. This approach overcomes the sparsity of event stream data by grouping similar actions together in a probabilistic manner, enabling the characterization of both team and player playing styles. Notably, SoccerMix offers an alternative perspective on a team's style, focusing on how it influences opponents’ playing styles. Forcher et al. (52) focused on analyzing defensive performance in football, utilizing tracking data to predict successful ball gains in defense. They derived player and team metrics from tracking data and trained machine learning classifiers to distinguish successful defensive plays from unsuccessful ones. The study identified tactical principles related to gaining possession, such as pressing the ball-leading player and creating numerical superiority in key areas. García-Aliaga et al. (38, 53) utilized ML algorithms to determine the on-field playing positions of football players based on their technical-tactical behavior. By analyzing non-spatiotemporal descriptors computed from match event records, they identified discriminatory variables for player positions using dimensionality reduction techniques and machine learning algorithms like RIPPER. This approach provided valuable insights for enhancing player performance and identifying positions on the field. FatigueNet, a deep learning algorithm for predicting players’ perceived exertion levels from movement data collected during football sessions. By preprocessing raw GPS data and leveraging deep learning techniques, FatigueNet achieved effective prediction of perceived exertion, offering a potential automated and objective fatigue monitoring system for players (54). In their study, Narizuka and Yamazaki (55) delved into the realm of football analytics by focusing on analyzing player performance in relation to different pitch positions. They emphasized the importance of understanding how various factors influence player performance across different areas of the pitch. To achieve this, they developed a novel clustering algorithm based on the Delaunay method, which enables the characterization of team formations dynamically.

By applying this algorithm to datasets from multiple football games, the studies can identify average formations such as “1-4-4-2,” “1-4-1-4-1,” and “1-4-3-3” and further explore specific patterns within each formation. This method allows for visualization, quantitative comparison, and time-series analysis of formations, providing insights into team styles and player positional exchanges. Tuyls et al. provide a comprehensive perspective on the intersection of AI, game theory, and computer vision in football analytics (8). They highlight the immense potential of leveraging these fields to revolutionize the analysis of both individual players’ and coordinated teams’ behaviors in football. Through a review of state-of-the-art techniques, they illustrate how combining AI, game theory, and computer vision enables various analyses, including counterfactual analysis using predictive models and game-theoretic analysis of penalty kicks with statistical learning of player attributes. Their work underscores the transformative impact of football analytics not only on the game itself but also on the broader field of AI research. Player performance is the most important factor that affects match scores. Factors affecting player performance are not the same for all players and vary according to pitch positions. Analyzing these performance factors in relation to pitch positions can help understand which characteristics of players need to be developed to win. Player training can be arranged accordingly, and team tactics can be changed or improved. Although the importance of analyzing the individual performances of players according to pitch positions has been emphasized in various studies, a large amount of data available have made this analysis difficult. Machine learning can be used to overcome this difficulty. However, ML studies in sports mostly focus on score prediction. There is a lack of traditional and ML approaches that examine the effect of individual player performances on game results. In this context, the datasets of the 2010 and 2014 FIFA WC were analyzed through multilayer artificial neural networks. A specific model was established for each dataset by organizing relevant datasets according to year, player positions, and match levels (group–final). The rectifier linear unit was selected as the activation function for each model. Architecture and hyperparameters for each model were determined through grid optimization. The factors affecting player performances were ranked by Gedeon's relative importance calculation. The average performance indicators for the group matches are 81.34% precision, 87% recall, and 0.84 F1 score (38). They begin by examining existing research on image recognition in football and then proceed to develop a novel football image classification model. This model integrates bidirectional LSTM to extract spatial features and capture temporal dynamics inherent in image sequences. Through rigorous simulation analyses, they demonstrate the model's high recognition accuracy and consistent performance in action recognition and classification tasks. Their findings offer valuable insights into injury prevention and personalized skill enhancement in football training. By analyzing datasets related to sports achievements and employing deep learning models, they identify KPI influencing achievements and develop predictive models for accurate prediction. Their study highlights the importance of understanding and predicting sports achievements, offering valuable insights for improving athletic performance and training strategies (42, 56). The Yücebaş (56) delves into the intricate relationship between player performance and match outcomes in football, particularly focusing on how performance factors vary across different pitch positions. Recognizing the importance of understanding these nuances for strategic planning, the study employs advanced machine learning techniques to analyze datasets from the 2010 and 2014 FIFA WC. By establishing specific models tailored to each dataset and utilizing multilayer artificial neural networks, the study aims to uncover the factors influencing player performances and their impact on match outcome.

Novel spatiotemporal-based models on player movement and team formations were developed based on convolutional neural networks and deep learning architectures (4, 26). Their empirical comparison demonstrated the superiority of these kernels and their efficient approximations for clustering tasks in team sports data, effectively addressing limitations found in existing techniques (25, 57). Supervised and unsupervised ML are distinguished primarily by the presence or absence of labeled data during training (58). In supervised learning, the training data are accompanied by labels indicating the class or category of each data point, allowing models to learn from known outcomes and make predictions accordingly. In contrast, unsupervised learning involves datasets without labels, where the objective is to uncover intrinsic structures, groupings, or patterns—often for clustering or dimensionality reduction—without external guidance. Fernando et al. (59) explored goal-scoring patterns in football using player and ball-tracking data. They utilized fine-grained tracking data from Prozone to cluster multiagent trajectories and developed an EGV or xG model for analysis. Their research aimed to identify and quantify goal-scoring methods of teams while comparing their goal-scoring styles. In addition, Lucey et al. (45) developed a method to estimate chances in football by analyzing strategic features extracted from player and ball tracking data. Their study focused on analyzing spatiotemporal patterns before shots using LR and conditional random field analysis. By quantifying shot efficiency and team strategy based on spatiotemporal data analysis, they provided valuable insights into the factors influencing goal likelihood and team performance (41, 60).

In supervised learning, some of the main techniques include LR classification and neural networks. LR models the relationship between a continuous dependent variable and one or more independent variables, aiming to predict numerical values. Classification techniques, on the other hand, categorize data into predefined classes or categories. Common algorithms include decision trees, k-nearest neighbors (KNN), and support vector machine (SVM). Neural networks, inspired by the functioning of the human brain, consist of multiple layers of artificial neurons and can be applied to both regression and classification problems. Popular architectures include convolutional neural networks (CNNs) and RNNs. In unsupervised learning, the main techniques include clustering, dimensionality reduction, and association rules. Clustering group data are based on similarities with common algorithms including k-means, hierarchical clustering, and Gaussian mixture models. Dimensionality reduction techniques aim to reduce the number of variables in a dataset, while preserving as much variability as possible. Common methods include PCA and t-SNE. Association rule mining identifies frequent relationships between different variables in a dataset, with the most known algorithm being apriorism, often used in market analysis and product recommendation systems. These are among the most widely used techniques in supervised and unsupervised machine learning. The choice of methods depends on the specific problem being addressed and the characteristics of the available data.

4.1 Practical applications, research limitations, and future research

Dealing with raw data (x, y, z) from GPS, LPM, or VBMA tracking in sports analytics requires robust IT infrastructure capable of handling large volumes of data, processing it efficiently, and extracting meaningful insights. The reviewed studies highlight the growing importance of advanced analytics, particularly in football, to enhance player performance, tactical awareness, and overall team dynamics. Through innovative approaches such as clustering algorithms, ANN, ML, and DL techniques, and the integration of AI, game theory, and computer vision, researchers are uncovering complex movement patterns within player performances, formations, and game strategies. By analyzing factors such as player positioning, team formations, and transitional patterns, these studies aim to provide valuable insights into optimizing player training, refining team tactics, and ultimately improving game outcomes. Furthermore, the use of advanced data processing methods, such as DL algorithms and image recognition techniques, enables the extraction of comprehensive features from highly complex datasets, allowing for an accurate performance assessment and the development of predictive models. By understanding the nuances of player behaviors and game dynamics, coaches and analysts can make informed decisions to enhance training regimens, develop personalized strategies, and maximize player potential. Moreover, the integration of ML not only facilitates the analysis of retrospective performance analysis but also enables real-time monitoring and predictive insight, empowering teams to adapt and strategize dynamically during matches. Overall, these studies underscore the transformative impact of data-driven approaches in football analytics, offering a deeper understanding of player performance, team formations, and game strategies. By harnessing the power of advanced analytics and AI technologies, researchers aim to revolutionize player development, tactical planning, and overall game management. The insights gained from these studies have the potential to reshape the landscape of football association (football) analytics, driving continuous innovation and improvement in player and team performance analysis. Indeed, the AI-based applications football insights offers substantial practical benefits for understanding training sessions progression or adjusting tactical strategies in real-time during match play. By leveraging spatiotemporal tracking data with advanced modeling techniques, sport scientists, coaches, and performance analysts can identify individual and collective patterns, assess team cohesion and match principles, and simulate opponent behaviors, enabling more informed decision-making in both training process and match management. However, the full potential of these applications remains constrained by critical limitations in data accessibility, replicability, and standardization. Current datasets vary widely in sampling frequency, data structure, and proprietary constraints across different leagues and myriad platforms, posing challenges to cross-study comparisons, algorithmic generalization, and consistent and reliable longitudinal data. To address this, it is essential to advocate for the creation of open-access benchmark datasets and the adoption of standardized data collection protocols. These initiatives would not only enhance the reproducibility of AI-driven tactical analyses but also democratize access to cutting-edge tools for clubs, researchers, and federations with limited resources, fostering broader innovation in performance optimization.

In addition, the practical application in understanding football tactical behavior, collective dynamics, and movement patterns through AI enhances the strategic capabilities of coaches, facilitates player development, improves opposition analysis, provides real-time decision support, enables performance prediction, and enhances talent identification processes in football. Concretely, the new insights can be reported for the data science and match analysis departments of football clubs such as (1) tactical insights: by leveraging AI algorithms to analyze vast amounts of match data in football, coaches and analysts can gain deeper insights into the tactical strategies employed by individual and team performances. This includes understanding patterns of play, positional rotations, pressing schemes, and defensive organization. Such insights can inform tactical adjustments during matches and help teams exploit opponent weaknesses; (2) player development: AI-driven analysis allows for a granular examination of individual player performance within the context of team tactics (33, 61). Coaches can identify players’ strengths and areas for improvement, tailor training programs accordingly, and provide targeted feedback to enhance overall team cohesion and performance (47); (3) opponent analysis: AI-powered systems enable a comprehensive scouting and analysis of upcoming opponents. By dissecting the tactical tendencies, formations, and key player behaviors of opponents, teams can develop specific game plans and counterstrategies to maximize their chances of success (38, 47, 50, 53). (4) Real-time decision support: AI tools can provide real-time insights and recommendations to coaches during matches. By continuously analyzing live match data, these systems can offer suggestions for substitutions, tactical adjustments, and set-piece strategies, empowering coaches to make informed decisions under pressure; (5) performance prediction: AI models can be trained to predict match outcomes based on historical data and contextual factors (7, 19). While not infallible, these predictive analytics can help teams assess their chances against specific opponents and adjust their approach accordingly (36, 47); (6) talent identification: AI-driven analysis can aid in the identification and recruitment of talented players (40).

By analyzing player performance metrics, playing styles, and potential, clubs can make more informed decisions when scouting and signing new players, optimizing their recruitment strategies (40, 52). The AI-based approaches prioritize the identification of positional regularities through pattern recognition in tracking and spatiotemporal data, while others focus on the continuous evaluation of effective playing space by modeling dynamic territorial control. This methodological distinction underscores the specific strengths of each technique. ML models generally offer greater interpretability and are well-suited for segmenting known behavioral patterns, whereas DL models exhibit a higher capacity to model complex and evolving phenomena, although at the expense of interpretability. Consequently, a structured comparative analysis suggests that the selection of AI models should be guided by the nature of the tactical behavior being investigated and the degree of model explainability required for practical application in training and competition. Specifically, AI models were employed to extract meaningful tactical indicators from positional data, enabling pattern recognition in team strategies and player behavior. In this context, there is still a need for professional profiles, combining sports science, data analytics, computer science, and coaching expertise. Also, there are still challenges for the real practical application of the ethical regulation of AI-based techniques in football science.

Other repositories or digital libraries consulted in the systematic review comprise the following: Github (https://github.com/) (31), scipy.cluster.hierarchy and scipy.spatial (55), FA software (40), and HalvingGridSearchCV (52). The most widely used API platforms in the literature for collecting tracking data and KPIs were InStat Inc., Opta Sports, Wyscout, STATS, Sec-ondSpectrum, SciSports, and StatsBomb. A future review will be important to distinguish the differences in KPIs and which studies have been carried out with each of the platforms. The most widely used computer languages are MATLAB, Python, and Rstudio. The prompts for each of these environments should be explored further to better understand what impact AI-based algorithms have on data visualization and, specifically, tactical analysis developing consensus statements, guidelines, and recommendations for model transparency, interpretability, and application of complex AI models (i.e., CNN, LSTM, RNN) and explainable AI (XAI) techniques still dubious and needs a more generalized consensus. Thus, the ethical considerations surrounding and their relevance for practitioner trust and adoption should be deepened. In addition, the institutions, umbrella organizations, and federations must address the key ethical issues, including data privacy, potential algorithmic biases, and the responsible use of player tracking data. These reflections are intended to foster a more critical and responsible application of AI in sports contexts for “black box” models.

The expansion to underrepresented populations and football insights in the dataset routines and coaching decision-making still needs to be explored (4, 26). However, the possibility of automated modeling in the context of predicting training outcomes, match running management (19, 62), talent identification (9, 63), injury prevention (64, 65), and pacing strategies (63, 66) in itself leaves future prospects for expanding the results already published by the studies reviewed. However, the authors should prioritize an integrative approach and massify these datasets. There is still an extensive gap to fill in youth (4, 62) and women's (67–69) football, especially in subelite settings, different competition levels, and contextual variability. Finally, the importance of multidisciplinary teams for AI model development and interpretation, bridging the knowledge gap between developers and end users (e.g., coaches) and developing training programs and digital literacy for effective AI use must be underscored. The next DL, ML, and AI-based models must be developed so that we can make decisions based on (1) real-time decision support, performance prediction, and match management; (2) strategic and tactical thinking, training task design and planning, and substitution managing; (3) practical integration of spatiotemporal tracking data into coaching and practice routines. Also, the effect of these models in other areas of coaching and training should be explored, specifically in organizational management and communication.

5 Conclusions

This systematic review summarizes the latest trends in the literature on the use of AI-based methodologies to understand individual and collective tactical patterns in football. Utilizing insights from studies on goal-scoring patterns, spatial movement analysis, and performance evaluation through ANN, DL, and ML, coaches can refine training sessions to enhance offensive tactics, defensive strategies, and player development, ultimately improving team performance on the field. Furthermore, AI-based tactical assessment tools provide real-time and predictive analysis capabilities, improving decision-making processes and tactical planning in football training and competition. In conclusion, AI-based models can effectively reshape the landscape of spatiotemporal tracking data into training and practice routines with real-time decision-making support, performance prediction, match management, tactical-strategic thinking, and training task design. Nevertheless, there are still challenges for the real practical application of AI-based techniques, as well as ethical regulation and the formation of professional profiles that combine sports science, data analytics, computer science, and coaching expertise.

Author contributions

JT: Conceptualization, Formal analysis, Writing – original draft, Investigation, Methodology, Writing – review & editing. EM: Data curation, Formal analysis, Visualization, Writing – original draft. PA: Methodology, Software, Validation, Writing – review & editing. SE: Conceptualization, Methodology, Resources, Writing – review & editing. GM: Data curation, Validation, Visualization, Writing – review & editing. RM: Conceptualization, Data curation, Formal analysis, Writing – review & editing. TB: Conceptualization, Methodology, Resources, Validation, Writing – review & editing. AM: Project administration, Supervision, Validation, Writing – review & editing. PF: Project administration, Supervision, Writing – review & editing. RF: Methodology, Software, Validation, Writing – review & editing. LB: Conceptualization, Data curation, Formal analysis, Methodology, Visualization, Writing – original draft, Writing – review & editing.

Funding

The authors declare that financial support was received for the research and/or publication of this article. This project was supported by the National Funds through the FCT Portuguese Foundation for Science and Technology (project UID/CED/04748/2020 and UIDB04045/2021), Life Quality Research Center (LQRC-CIEQV), Santarém, Portugal; Research Centre in Sports Sciences, Health Sciences and Human Development, Vila Real, Portugal; SPRINT—Sport Physical Activity and Health Research and Innovation Center, Portugal; and Research Center for Active Living and Wellbeing (Livewell), Bragança, Portugal.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Sarmento H, Marcelino R, Anguera MT, CampaniÇo J, Matos N, LeitÃo JC. Match analysis in football: a systematic review. J Sports Sci. (2014) 32:1831–43. doi: 10.1080/02640414.2014.898852

PubMed Abstract | Crossref Full Text | Google Scholar

2. Rico-González M, Los Arcos A, Nakamura FY, Moura FA, Pino-Ortega J. The use of technology and sampling frequency to measure variables of tactical positioning in team sports: a systematic review. Res Sports Med. (2020) 28:279–92. doi: 10.1080/15438627.2019.1660879

PubMed Abstract | Crossref Full Text | Google Scholar

3. Sarmento H, Clemente FM, Araújo D, Davids K, McRobert A, Figueiredo A. What performance analysts need to know about research trends in association football (2012–2016): a systematic review. Sports Med. (2018) 48:799–836. doi: 10.1007/s40279-017-0836-6

PubMed Abstract | Crossref Full Text | Google Scholar

4. Teixeira JE, Forte P, Ferraz R, Branquinho L, Silva AJ, Monteiro AM, et al. Integrating physical and tactical factors in football using positional data: a systematic review. PeerJ. (2022) 10:e14381. doi: 10.7717/peerj.14381

PubMed Abstract | Crossref Full Text | Google Scholar

5. O’Donoghue P. Research Methods for Sports Performance Analysis. London: Routledge (2009). doi: 10.4324/9780203878309