Dynamic transfer learning with co-occurrence-guided multi-source fusion for urban spatio-temporal crime prediction

Cui, Chen; Zheng, Ziwan; Du, Hao; Wang, Wen

doi:10.3389/fdata.2026.1697392

ORIGINAL RESEARCH article

Front. Big Data, 05 February 2026

Sec. Data Analytics for Social Impact

Volume 9 - 2026 | https://doi.org/10.3389/fdata.2026.1697392

Dynamic transfer learning with co-occurrence-guided multi-source fusion for urban spatio-temporal crime prediction

CC
Chen Cui ^1,2
ZZ
Ziwan Zheng ¹^*
HD
Hao Du ¹
WW
Wen Wang ³

1. Key Laboratory of Public Security Information Application Based on Big-data Architecture, Ministry of Public Security, Hangzhou, China
2. College of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
3. Zhejiang SUPCON Information Co., Ltd, Hangzhou, China

Article metrics

View details

357

Views

Downloads

Abstract

Spatio-temporal crime prediction is crucial for optimizing police resource allocation but faces challenges including data sparsity, which hinders models from extracting effective patterns and limits robustness—and the underutilization of cross-type crime co-occurrence correlations. To address these issues, we propose a transfer learning approach that explores underlying cross-type relationships, enabling the sharing of spatio-temporal features across crime types and alleviating data sparsity. An adaptive weight updating mechanism is incorporated to enhance the perception of distinct crime categories, while the impacts of points of interest (POIs), meteorological factors, and other features are also analyzed. Experiments on real-world data from a Chinese city show that our model comprehensively captures latent features across crime types, thereby enhancing predictive performance and robustness, particularly for crime types with sparse data. Moreover, it effectively incorporates environmental features, further improving crime prediction performance.

1 Introduction

Spatio-temporal crime prediction holds great significance, as it can provide guidance for police resource allocation, thereby reducing public property losses. It is pivotal for enabling proactive policing, optimizing resource efficiency, and enhancing urban safety management—especially in densely populated urban areas where crime incidents are diverse and dynamic, placing high demands on real-time and precise security deployment. With the acceleration of urbanization and the expansion of urban areas, the complexity of crime patterns has increased, making traditional reactive policing strategies inadequate. Thus, this topic increasingly attracts the attention of researchers. Traditional spatio-temporal crime prediction is primarily based on either one or both of the temporal and spatial correlations in crime (Zhao and Tang, 2017), with representative approaches including crime near-repeat models (Townsley et al., 2003), kernel density estimation models (Bowers et al., 2004; Kalinic and Krisp, 2018), and self-exciting point process models (D'Orsogna and Perc, 2015; Mohler et al., 2011). However, these traditional approaches exhibit limited applicability. On the one hand, these models are too simplistic, making it challenging to capture complex spatio-temporal correlations in crime. On the other hand, these traditional approaches do not take into account multiple crime-related auxiliary data sources such as points of interest (POIs) and weather data.

With the increasing application of AI technology, research in geospatial artificial intelligence (GeoAI) has been on the rise. GeoAI models can capture complex spatio-temporal correlations and extract features from multiple auxiliary data. Currently, they are mainly applied to spatio-temporal prediction tasks in data-intensive domains such as traffic flow, temperature, and air quality (Gao, 2021; Deng et al., 2025). These methods have achieved favorable predictive results, prompting researchers to actively study spatio-temporal crime prediction using these techniques in recent years (Sun et al., 2023). However, several challenges persist in the field of spatio-temporal crime prediction.

1.1 Challenge 1 (addressing the sparsity of spatio-temporal crime data)

Previous research has demonstrated that crime is not evenly distributed in space, and crime data are sparse and not continuous in both time and space dimensions (Andresen and Malleson, 2011; Chainey and Ratcliffe, 2013; Ratcliffe, 2004), making spatio-temporal crime prediction a challenging task (Weisburd, 2015). A common strategy to address sparsity is spatiotemporal aggregation, which increases data density by predicting for larger areas or longer periods. However, this leads to a coarsening of analytical granularity, sacrificing detailed insights for statistical stability. To address this issue, current methods have focused on densifying crime data. Typically, smoothing techniques are employed to create a pseudo-continuity in the temporal dimension (Calatayud et al., 2023; Kumar et al., 2018; Taddy, 2010; Zhang and Cheng, 2020). When dealing with spatial dimensions, it is common to apply weighted smoothing techniques that account for the near-repeat effect (Calatayud et al., 2023; Hart et al., 2022; Johnson et al., 2009; Kidner et al., 2002). However, the potential problems and limitations of using smoothing techniques to handle sparse data, such as information loss and overfitting, have affected the performance of spatio-temporal crime prediction models. In summary, it is both important and challenging to address the sparsity in spatio-temporal crime prediction.

1.2 Challenge 2 (modeling cross-type temporal–spatial correlation adequately)

A previous study (Zhao et al., 2022) verified the existence of correlations among different types of crime from temporal and spatial perspectives, that is, the co-occurrence phenomenon among different types of crime (Liu et al., 2018b). These existing research results provide the foundation for leveraging cross-type correlations for accurate spatio-temporal crime prediction, which is often overlooked in most current spatio-temporal crime prediction models. Utilizing these correlations allows models to share learned patterns across crime types, which is particularly beneficial for improving predictions when data for a specific type is sparse.

To address these challenges, based on transfer learning, which aims to leverage knowledge obtained from one domain to enhance learning performance in another, we studied an adaptive transfer learning model for spatio-temporal crime prediction. Specifically, transfer learning is employed to tackle data sparsity by sharing features across crime types, and an adaptive weight mechanism is designed to model cross-type correlations adequately. The following summarizes our main contributions.

(1) Based on the crime co-occurrence phenomenon, a spatio-temporal crime prediction model is proposed using transfer learning, which analyzes spatial, environmental, and temporal characteristics of different types of crimes.
(2) An adaptive method utilized in the transfer learning process allows the model to pay sufficient attention to the influence of different types of crimes, effectively alleviating data sparsity and enhancing the model's robustness and performance for crime types with limited data.
(3) We conducted comprehensive experiments using real-world datasets to assess the performance of the proposed spatio-temporal crime prediction model and analyze the role of each component in the model for crime prediction.

2 Related work

In this section, we mainly discuss related work on spatio-temporal crime prediction based on artificial intelligence.

2.1 Traditional machine learning models

The random forest, a traditional machine learning model, has been widely applied in predicting crime hotspots (Alves et al., 2018). For example, Liu et al. (2018a) adopted the random forest and kernel density method to predict crime hotspot situations across different time periods. However, it focused on the distribution of crime data and ignored the potential influence of spatio-temporal factors such as weather conditions and the built environment on criminal behavior. Therefore, in their subsequent research (Liu et al., 2019), Liu et al. deeply analyzed the spatial variations of crime and the built environment, as well as the varying relationship between crimes and the built environment, employing the random forest algorithm to predict public property crime. This demonstrated that crime prediction models can be improved by incorporating the aforementioned spatial variations and spatially varying relationships.

As another typical machine learning model, boosting learning algorithms have also been widely applied to address crime prediction challenges. For example, Zhang et al. (2022) and Deng et al. (2023) compared the XGBoost model with other popular machine learning models such as logistic regression, decision trees, and random forests; the XGBoost model clearly showed the best fit. Kim and Lee (2023) also stated that the Light Gradient Boosting Machine model (LightGBM) was chosen as the most suitable model for predicting crime incidence in 250 m grid units of Seoul by comparing linear regression, random forests, and multi-layer perceptrons.

Although random forests and boosting learning algorithms have shown promising results in crime prediction, their relatively lower model complexity may limit their ability to capture highly intricate non-linear spatio-temporal relationships. Additionally, when dealing with extensive and diverse crime data and multiple crime-related auxiliary data sources, they might encounter constraints in terms of their robustness, potentially overfitting or failing to capture underlying relationships within the data.

2.2 Deep learning models

Deep learning models stand out for their capability to automatically learn hierarchical feature representations directly from raw, high-dimensional spatio-temporal data, an advantage over many machine learning methods that often rely on carefully engineered features. This representation learning capability, coupled with their ability to model complex non-linear relationships, further advances their adoption in this field. For example, Yan and Hou (2020) employed a long short-term memory (LSTM) neural network, which is particularly adept at capturing temporal dependencies, for theft crime prediction. Wu et al. (2023) researched the issue of fairness in crime prediction models implemented with deep learning approaches. Gao et al. (2022) combined LSTM and regression to predict telecommunication network fraud crimes. In addition, due to their inductive bias for spatial structures, convolutional neural networks (CNNs) and graph neural networks (GNNs) have been widely applied in spatio-temporal crime prediction. For example, Duan et al. (2017) utilized a deep convolutional neural network for spatio-temporal crime prediction. Graph-based models were proposed to capture dynamic patterns of criminal behavior for crime prediction (Sun et al., 2023; Han et al., 2020; Wang et al., 2020, 2018). Recently, Shahmoradi et al. (2025) proposed a hybrid model integrating ST-ResNet and LSTM for precise crime hotspot prediction, demonstrating the effectiveness of combining spatial and temporal deep learning architectures. Overall, researchers are currently primarily focused on spatio-temporal crime prediction based on deep learning. However, there are not many research results on the issues of the sparsity of spatio-temporal crime data and the cross-type correlations of different types of crime so far, requiring further investigation. Specifically, the potential of explicitly modeling cross-type correlations for knowledge transfer and as a built-in mechanism to counteract data sparsity remains underexplored. Here, transfer learning, which has been widely applied in fields such as computer vision (Gopalakrishnan et al., 2017; Shin et al., 2016), natural language processing (Ruder et al., 2019), and autonomous driving (Huang et al., 2018), is adopted for spatio-temporal crime prediction to effectively utilize knowledge transfer to learn the relationships between crime types and alleviate the problem of data sparsity for certain crime types.

3 Preliminaries

Urban Data: The information on crime occurrences, including crime type, timestamp, latitude, longitude, and surrounding environmental conditions, is recorded. Every crime report is mapped to a geographic region based on its location. Here, we begin with some necessary notations and then formally present the problem studied in this work. Particularly, we consider a set of R regions in a city, K types of crime (e.g., burglary and assault), and a sliding window with T time slots (e.g., days). Define r, i, and t as the indices for the region, crime type, and time slot, respectively. Let denote the observed i-th type of crime and crime-related auxiliary data source, where the elements X_{i, r, t, 1} and X_{i, r, t, h} are, respectively, the number of i-th type of crime and the h-th crime-related auxiliary feature (e.g., temperature and rainfall) observed at the r-th region in the t-th time slot in a sliding window, and H is the number of crime-related auxiliary features selected. In addition, define a crime vector to represent quantitative distribution of the i-th type of crime across regions. Specifically, each element y_i(r) is the number of i-th type of crime at the r-th region in the (T+1)-th time slot. Furthermore, the supervised processed data can be formulated as , where and N denotes the samples for the i-th crime type generated by the sliding window. Here, denotes the n-th sample pair, with being the input tensor and the corresponding output vector for that sample.

Task Formalization: The main task is to construct a predictive model for each type of crime that utilizes transfer learning to fully learn the cross-type spatio-temporal correlation characteristics of K types of crime. Specifically, the model input is the historical crime counts and feature tensor , and the output is the predicted crime count—that is, the quantitative distribution of crimes for a certain type at time slot T+1. The prediction process can be described as , where each element of y_i is the predicted crime count of the i-th type in the corresponding region. This is a regression task. We aim to predict continuous crime count values, rather than discrete categories.

4 Proposed method

The proposed framework—the co-occurrence-guided adaptive transfer learning (CATL) model—based on the co-occurrence phenomenon of crimes, is designed to solve the above-formulated spatio-temporal crime prediction task, as shown in Figure 1. The proposed model consists of two major modules, namely feature extraction and adaptive model optimization. Specifically, we utilize two types of modules to calculate spatial similarity, compute temporal similarity via the temporal feature extraction module, and finally output the crime prediction results. The two modules are explained in detail in the following subsections.

Figure 1

4.1 Feature extraction module

First, we construct K independent convolutional long short-term memory [ConvLSTM (Khosravinia et al., 2023)] networks to extract spatial and temporal features of each type of crime. ConvLSTM is a spatio-temporal extension of standard LSTM, integrating convolutional operations into input-to-state and state-to-state transitions. It is selected for our grid-structured crime data because it simultaneously captures spatial correlations via convolutions and temporal dependencies via LSTM gates—avoiding information loss from separate spatial–temporal processing, which is a limitation of other hybrid models. The model architecture, optimized via full grid search, is detailed as follows: convolutional part (1,024 nodes per layer, 3 × 3 convolutional kernels with stride = 1 and padding = 1); recurrent part (1,024 nodes per layer); activation functions (sigmoid for gate control, tanh for state update); and output feature dimensions (spatial v = 512, temporal V = 512). The number of hidden layers for the convolutional part is l = 2 and that for the recurrent part is m = 2. Let the current input be a segment of data for the i−th type of crime. In spatial feature extraction, the output of the K CNNs with l hidden layers can be formulated as

where v is the output feature dimension of each hidden layer, q represents the q−th hidden layer, and θ denotes the learnable model parameters of the CNN.

Then, LSTM is used to extract temporal features, and the output of the K LSTMs with m hidden layers can be formulated as

where V is the feature dimension of each hidden layer and q and denote the q−th hidden layer and learnable model parameters of the LSTM, respectively.

After obtaining the output results for each hidden layer of both CNN and LSTM, we need to compute the overall probability distribution differences among crime types. Given that each hidden layer of CNN and LSTM contains only partial information of the input data, all hidden layer outputs should be considered when calculating the overall distribution differences. Given a crime type-pair (D_i, D_j), the loss of probability distribution differences can be formulated as

where d(·, ·) represents the probability distribution distance function. Considering the sparsity of crime spatio-temporal data, we adopt the maximum mean discrepancy (MMD) as the distance measure d(·, ·) to better measure distribution discrepancy among different types of crimes and enhance computational efficiency:

where k(·, ·) is a Gaussian kernel function and N_s = |N_s| and M_t = |M_t| are the numbers of data points from the respective distributions. Furthermore, considering the particularity of crime data, it is essential for the model to effectively capture low-frequency but high-impact crime patterns. Therefore, we adopt a more sensitive bandwidth function h for the Gaussian kernel function k(·, ·) as follows:

where σ is a constant used to control the size of the bandwidth function and ϵ is a small value to prevent numerical instability during computation.

4.2 Adaptive model optimization

To minimize the predictive error for each type of crime, the mean square error (MSE) loss function is often adopted.

where |D_j| is the total length of the j-th type of crime data after being processed with a sliding window.

However, solely minimizing Equation 10 is insufficient for the model to learn the distributional differences among different types of crimes. Therefore, we combine Equation 5 and Equation 6 and introduce distributional discrepancy:

During the training process, to avoid overly focusing on a few types of crime distributions while neglecting others, we introduce to update the importance of different types of crime distributions. Here, denotes the rate of change of distribution discrepancy loss between crime types D_i and D_j in the N-th epoch. In summary, the loss function and the adaptive weight update rule can be formulated as

When is greater than epoch N, the distribution discrepancy between D_i and D_j has increased, and in epoch N+1, we need to increase the importance of this discrepancy by increasing . Finally, we need to normalize the weights.

5 Experiments

Here, extensive experiments with real-world datasets from a prefecture-level city in China, are conducted to evaluate the effectiveness of the proposed CATL model. We mainly aim to answer three questions:

(1) How do the cross-type spatio-temporal correlations and the adaptive methods adopted in the transfer learning process benefit spatio-temporal crime prediction?
(2) What is the performance of variants of the proposed spatio-temporal crime prediction model with different combinations of crime-related auxiliary data?
(3) How does the proposed CATL model perform compared to state-of-the-art baselines?

5.1 Settings

5.1.1 Dataset description

In accordance with data security and privacy protocols, the source province is referred to anonymously as “J Province” in this paper. The experiment was conducted in a prefecture level city in J Province, China, which covers an area of 668 km² and has a registered resident population of 714,000. Some sparsely populated areas in the city, such as villages and farmland, have relatively low crime rates and hold little significance for spatio-temporal crime prediction. Hence, the central urban area of the city was chosen as the research area. The selected area covers 243 km², accounting for 30.8% of the registered residential population. We collected five types of crime data (i.e., burglary, assault, rape, drugs, and gambling) from January 2014 to August 2021, totaling 32,531 records. The sparsity level of each crime type—quantified by average records per 2 km × 2 km grid per day—is as follows: burglary (0.12), assault (0.0035), drugs (0.0062), gambling (0.0025), and rape (0.0013), showing varying degrees of sparsity consistent with spatio-temporal granularity constraints. This data was pre-processed through a standard spatio-temporal data processing pipeline: each record was mapped to a 2 km × 2 km grid and aggregated into daily time slots to form the model's input tensors. To ensure privacy, all personal identifiers were removed, and precise locations and timestamps were obfuscated through spatial aggregation and daily temporal aggregation, respectively. This selection was based on data availability and operational definitions from our data source; we note that the data reflect reported incidents. The processed data were integrated into structured tensors compatible with the model input requirements. The crime-related environmental features include weather, POI, and other features such as historical surveillance count (the number of registered public security cameras in the target grid during the corresponding time slot) and population count, as detailed in Table 1.

Table 1

Category	Variable	Description
Weather features	Temperature	Temperature variance, maximum, and minimum values in time slots
	Atmospheric pressure	Atmospheric pressure variance, maximum, and minimum values in time slots
	Humidity	Humidity variance, maximum, and minimum values in time slots
	Rainfall	Rainfall variance, maximum, and minimum values in time slots
	Wind speed	Wind speed variance, maximum, and minimum values in time slots
POI features	Public sector industry	Public activity places, including KTV, tea houses, large shopping malls, cinemas, video game rooms, bars, beauty salons, chess and card rooms, internet cafes, dance halls, bathhouses, foot massage parlors, and other publicly accessible venues with high human traffic
	Special industry	Places engaged in special industries, including pawn shops, hotels, used car dealerships, real estate agencies, scrap recycling, seal engraving, automotive repair, courier logistics, consignment businesses, jewelry trade, locksmith services, car rental, mobile communications, material distribution, and plate printing
	Ordinary unit	Places that are not considered as the key, public, special, or hazardous units, including ordinary companies, restaurants, factories, fruit and vegetable stalls, department stores, pharmacies, etc.
	Major unit	Places that need special protection by public security authorities, such as stations, urban water supply facilities, government offices, gas stations, gas companies, schools, hospitals, banks, etc.
Other features	Historical surveillance count	Number of surveillance counts in each region in time slots
Other features	Population counts in each region	Population counts in time slots

Crime-related auxiliary features.

The distributions of POI, surveillance, and population are illustrated in Figure 2.

Figure 2

5.1.2 Experimental setup

The basic experimental setup is summarized as follows:

(1) Applying a 2 km × 2 km grid unit, we generated a total of 90 spatial regions. We set a 1-day period as the fixed temporal granularity. Crucially, even at this 2 km × 2 km and daily resolution, the crime event data exhibits substantial sparsity (with many grid-day cells containing zero events), establishing a meaningful and challenging tested for evaluating our model's capability to mitigate sparsity issues. The dataset was divided into training sets, validation sets, and test sets in a 7:1:2 ratio. The split was performed in strict chronological order. Furthermore, a gap equal to the input sequence length (T) was introduced between the sets to prevent data leakage caused by the sliding window. This procedure ensures that the sliding windows are constructed independently within each chronologically ordered subset, eliminating any temporal overlap and thus preventing the data leakage scenario of concern.
(2) A full grid search was performed for hyperparameter optimization of the ConvLSTM models. Specifically, for all baseline models and our proposed CATL model, we conducted independent hyperparameter tuning to ensure a fair comparison and avoid experimental bias. For each model, we performed a grid search over key hyperparameters (e.g., number of layers, hidden units, learning rate, batch size, and regularization weight λ) and selected the configuration that yielded the best validation performance through iterative training. A series of comparative experiments yielded the hyperparameter settings. Adam with default parameters was used as the optimizer, and the learning rate was set to 0.001. The spatial feature extraction module and temporal feature extraction module consist of multivariate convolutional layers and LSTM layers. The batch size was selected from the candidate set {10, 12, 14, 16, 18, 20, 22, 24}. The weight for the regularization term λ in Equation 12 was selected from the range of [0.0, 1.0] with a step size of 0.1.
(3)The crime prediction performance is evaluated using mean absolute error (MAE) and mean squared error (MSE), which are formulated as follows:
MAE and MSE provide complementary perspectives: MAE reflects the average error magnitude, while MSE is more sensitive to large, sporadic prediction errors—a critical consideration in public safety applications, where n is the number of test samples and y_i and ŷ_i are the predicted and observed crime amounts, respectively.
(4) In the evaluation, the results are the median values obtained after conducting 10 independent experiments. Here, independent experiments refer to using 10 different random seeds to control the randomness of model weight initialization and data division, ensuring the reproducibility of the results. We chose the median instead of the mean because some crime types (e.g., rape and gambling) have extremely sparse data, and occasional outlier values may occur in repeated experiments. The median is more robust to such outliers and can objectively reflect the true performance of the model.

5.1.3 Baselines for comparison

We compared the performance of the proposed CATL model with the following baseline methods.

TGCN (Chen et al., 2020): The tag graph convolutional network is a valuable tool for capturing both spatial and temporal dependencies and for understanding and predicting complex temporal graph-structured data. It has shown promising results in various applications. Here, the TGCN module uses two layers and 1,024 hidden nodes.

DCRNN (Li et al., 2017): The diffusion convolutional recurrent neural network uses diffusion convolution to model spatial dependencies in the data, taking into account the structural connections between different locations. We chose the number of neighboring nodes K as 2, with 1,024 hidden nodes.

GRU (Dey and Salem, 2017): The gated recurrent unit neural network is a high-performing and widely used predictive model with a simple structure, fast training speed, and the ability to effectively capture temporal features.

LSTM (Graves, 2012): The long short-term memory neural network is widely used in time-series forecasting, effectively handling long-term dependencies and being suitable for multivariable predictions.

ConvLSTM (Khosravinia et al., 2023): Convolutional LSTM can capture spatiotemporal information from the input data while considering spatiotemporal sequence features. The CNN module consists of two layers, and the number of LSTM hidden nodes is set to 1,024.

GConvLSTM (Yuan et al., 2018): Graph convolutional LSTM is designed for spatiotemporal data and can integrate spatiotemporal information, handle complex relationships, and efficiently share parameters. In this case, the GCN module has two layers, and the number of LSTM hidden nodes is set to 1,024.

STGCN (Yu et al., 2018): The spatio-temporal graph convolutional network is designed for spatiotemporal prediction. It can simultaneously capture temporal and spatial relationships and offers numerous advantages when dealing with intricate spatiotemporal relationships and graph-structured data, including low computational complexity and efficient feature processing.

PDFormer (Jiang et al., 2023): Spatio-temporal graph convolutional network is designed for spatiotemporal prediction. It can concurrently capture temporal and spatial relationships and has numerous advantages when dealing with intricate spatiotemporal relationships and graph-structured data, including low computational complexity and efficient feature processing.

5.2 Ablation study on transfer learning components

As described in Section 4.2, an adaptive weighting mechanism is incorporated to enable the model to fully learn the probability distributions among different types of crimes. It ensures that the proposed CATL model does not overly prioritize the differences in probability distributions between certain pairs of crime types during the learning process, thereby neglecting relationships with other crime types. Figure 3 illustrates the weight update process, where the x-axis represents the training epoch and the y-axis denotes the value of the adaptive weight. When predicting a certain type of crime, theoretically, the probability distribution difference for the same type of crime should be 0 during the training process. Thus, the weight for this aspect should eventually approach 0. To validate the correctness of our model, we also incorporate the calculation of the probability distribution difference between the model and itself during the computation process. The observations from Figure 3 confirm this perspective, thereby validating our approach.

Figure 3

Furthermore, to better understand the CATL model and validate its effectiveness, we compared the crime prediction performance of our model without transfer learning (i.e., ConvLSTM, referred to as “BaseConvLSTM” here) and the model without adaptive weight updating (“CATL w/o Adaptive”). Table 2 presents the MAE and MSE of the three compared methods, and Figure 4 shows the improvement in MAE and MSE of our model compared to the other two methods. From the results, two key observations are summarized as follows.

Table 2

Crime category	BaseConvLSTM		CATL w/o Adaptive		CATL Model
	MAE	MSE	MAE	MSE	MAE	MSE
Burglary	0.4231	0.8869	0.4212	0.8877	0.3767	0.6901
Assault	0.0388	0.0137	0.0278	0.0115	0.0242	0.0108
Rape	0.0167	0.0093	0.0144	0.0093	0.0129	0.0093
Drugs	0.0544	0.0233	0.0377	0.0156	0.0321	0.0141
Gambling	0.0298	0.0226	0.0274	0.0217	0.0244	0.0217

Performance of different methods.

Figure 4

(1) Our model outperforms the other two models in predicting all types of crimes, and the CATL w/o Adaptive performs better than the BaseConvLSTM. This indicates that there are indeed certain underlying relationships between the probability distributions of burglary, assault, drugs, and other crime types. The model consistently focuses on the differences in probability distributions among various crime types in each epoch through the adaptive method adopted in the transfer learning process, allowing it to effectively capture the mutual influence between crimes. At the same time, it demonstrates that the proposed CATL leverages feature sharing in transfer learning to alleviate the issue of data sparsity.

(2) It can be inferred that our model exhibits heightened sensitivity to the varied distributions associated with burglary, assault, and drug abuse, as evidenced by the significant improvement in both MAE and MSE for these three crime types in Figure 4. This implies that the proposed CATL shows greater robustness in predicting these three types of crimes. On the other hand, there is noticeable improvement in the MAE for rape and gambling, while the MSE remains unchanged or shows little improvement. Figure 4's 0% MSE improvement for rape corresponds to the identical MSE value (0.0093) of all three models in Table 2, indicating no difference in prediction error. A negative percentage would be recorded if our model performed worse. Considering the adoption of median-based metrics and the pronounced sensitivity of MSE to errors, this suggests that the model strives to minimize errors as much as possible during optimization.

5.4 Impact of environmental features (weather/POI) on prediction

Effective feature selection has a significant impact on the performance of the spatio-temporal crime prediction model. To verify the improvement in prediction performance brought by weather and POI factors, and whether the proposed model can effectively perceive these two factors, we conducted an ablation study. To isolate the contribution of environmental factors, we compared three model configurations: a baseline excluding both weather and POI; an intermediate model adding weather features; and our full model integrating both. This sequential ablation cleanly quantifies the marginal gain from dynamic weather signals and the additional benefit of static POI context. To better illustrate the model's ability to perceive weather and POI factors, we present in Table 3 and Figure 5 the prediction results of different crime types across a range of batch sizes. This additional dimension of analysis allows us to verify that the observed influence of environmental features is consistent and not an artifact of a specific training batch configuration. Overall, the full model (with both features) outperforms the two ablated variants, demonstrating that our model effectively captures the relationships between crime and both weather and POI distributions. Specifically, we make the following observations.

(1) For burglary and assault, the fluctuations in MAE and MSE are substantial when ignoring POI and weather features. After considering weather features, MAE and MSE show significant improvements, especially for burglary. Furthermore, the overall prediction trend of the model becomes more stable when both weather and POI features are incorporated, resulting in MAE and MSE that are significantly lower than in the other two scenarios. This indicates that weather and POI have a substantial impact on burglary and assault, and our model effectively captures the relationship between these factors and the changing patterns of these two crimes.
(2) For burglary, drugs, and assault, the values of MAE and MSE remain relatively stable after incorporating weather features, but the improvement in crime prediction performance is not very significant compared to the scenario that ignores weather and POI. Fortunately, a notable improvement is observed when adding POI distribution data alongside weather features, especially for drugs. However, the MSE for gambling shows very slight fluctuations, mainly attributed to gambling's high concealment, which leads to variable reported data and a relatively low sample size that amplifies random training noise—this level of fluctuation is negligible and can be disregarded. This suggests that burglary, drugs, and gambling differ from other crime types, as they exhibit stronger correlations with specific population distributions influenced by psychological motives and social contexts, while being less sensitive to weather factors (Tipping et al., 2025).

Table 3

Batch size	Considering weather and POI
	Burglary		Assault		Rape		Drugs		Gambling
	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE
10	0.3530	0.5812	0.0243	0.0109	0.0133	0.0093	0.0330	0.0144	0.0251	0.0217
12	0.3572	0.6187	0.0245	0.0109	0.0131	0.0093	0.0326	0.0143	0.0246	0.0217
14	0.3611	0.6009	0.0245	0.0109	0.0136	0.0093	0.0331	0.0145	0.0246	0.0217
16	0.3668	0.6629	0.0251	0.0110	0.0130	0.0093	0.0328	0.0143	0.0243	0.0217
18	0.3691	0.6703	0.0247	0.0109	0.0135	0.0093	0.0328	0.0144	0.0246	0.0218
20	0.3755	0.6883	0.0249	0.0110	0.0145	0.0093	0.0332	0.0145	0.0250	0.0217
22	0.3763	0.6833	0.0248	0.0109	0.0132	0.0093	0.0331	0.0145	0.0249	0.0218
24	0.3739	0.6823	0.0251	0.0110	0.0129	0.0093	0.0329	0.0142	0.0251	0.0218
Batch size	Considering weather
10	0.3853	0.6893	0.0262	0.0111	0.0139	0.0093	0.0374	0.0156	0.0257	0.0217
12	0.3933	0.6920	0.0259	0.0111	0.0139	0.0093	0.0372	0.0155	0.0256	0.0217
14	0.4010	0.7953	0.0269	0.0113	0.0149	0.0093	0.0374	0.0157	0.0254	0.0217
16	0.4130	0.8182	0.0270	0.0113	0.0157	0.0094	0.0373	0.0156	0.0253	0.0217
18	0.4187	0.8501	0.0275	0.0114	0.0140	0.0093	0.0380	0.0155	0.0255	0.0217
20	0.4227	0.8854	0.0283	0.0114	0.0145	0.0093	0.0380	0.0158	0.0253	0.0217
22	0.4213	0.8678	0.0278	0.0114	0.0137	0.0093	0.0386	0.0149	0.0252	0.0217
24	0.4201	0.8434	0.0271	0.0108	0.0137	0.0093	0.0388	0.0152	0.0252	0.0217
Batch size	Ignoring weather and POI
10	0.3984	0.5980	0.0270	0.0113	0.0160	0.0093	0.0376	0.0156	0.0256	0.0218
12	0.5704	1.3079	0.0269	0.0112	0.0150	0.0093	0.0373	0.0155	0.0257	0.0217
14	0.6692	2.3740	0.0278	0.0113	0.0151	0.0093	0.0376	0.0156	0.0257	0.0217
16	0.5536	1.2358	0.0274	0.0114	0.0149	0.0093	0.0353	0.0157	0.0260	0.0217
18	0.4832	1.0501	0.0278	0.0114	0.0146	0.0093	0.0377	0.0156	0.0258	0.0217
20	0.6588	2.2882	0.0281	0.0115	0.0142	0.0093	0.0380	0.0159	0.0257	0.0216
22	0.4490	0.8053	0.0278	0.0114	0.0143	0.0093	0.0378	0.0157	0.0258	0.0217
24	0.6380	1.9824	0.0279	0.0115	0.0143	0.0093	0.0379	0.0155	0.0256	0.0217

Prediction results of different types of crimes under different batch sizes.

Figure 5

5.5 Performance comparison for crime prediction

Table 4 shows the MSE and MAE of all comparative methods, from which it can be observed that our model significantly outperforms all others in predicting all types of crimes. Notably, the model achieves more pronounced performance gains for crime types with higher sparsity (e.g., rape with 0.09 records per grid per day and gambling with 0.17 records), verifying its effectiveness in mitigating data sparsity. We attribute these improvements to the following factors:

(1) By employing transfer learning, the knowledge gained from other crime distributions can be used to predict a specific type of crime, facilitating feature sharing across different types of crimes.
(2) Adaptive weighting enables the model to learn effectively from different types of crime distributions, enhancing its flexibility and robustness against the data sparsity of individual crime types. This mechanism allows the proposed CATL model to adjust the importance of different types of crimes during the training process, thereby improving performance and robustness in predicting various types crime.
(3) For crime types with relatively low incidence rates, such as gambling, the use of transfer learning can effectively enhance the model's robustness, which is often compromised by data sparsity. By leveraging knowledge from other types of crimes with sufficient data, the CATL model can better generalize and make more accurate predictions for sparse-data crime categories.

Table 4

Model	Crime category
	Burglary		Assault		Rape		Drugs		Gambling
	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE	MAE	MSE
TGCN	0.8008	1.0902	0.0477	0.0116	0.0172	0.0096	0.0576	0.0166	0.0444	0.0222
DCRNN	0.8788	1.2071	0.0376	0.0112	0.0206	0.0096	0.0528	0.0163	0.0366	0.0221
STGCN	0.7433	1.8910	0.0279	0.0108	0.0172	0.0095	0.0505	0.0173	0.0364	0.0223
LSTM	0.5599	1.2433	0.0677	0.0142	0.0481	0.0104	0.0932	0.0249	0.0480	0.0220
GRU	0.6323	0.9946	0.0569	0.0131	0.0745	0.0133	0.0697	0.0232	0.0494	0.0228
ConvLSTM	0.4231	0.8869	0.0388	0.0137	0.0167	0.0093	0.0544	0.0233	0.0298	0.0226
GConvLSTM	0.8593	1.2329	0.0364	0.0112	0.0200	0.0096	0.0492	0.0161	0.0371	0.0221
PDFormer	0.5132	0.7931	0.0361	0.0112	0.0147	0.0083	0.0399	0.0149	0.0273	0.0220
Our Model	0.3767	0.6901	0.0242	0.0108	0.0129	0.0093	0.0321	0.0141	0.0244	0.0217

Comparison between different models in terms of MAE and MSE.

Bold values denote the minimum MAE and MSE for each crime category, indicating the best predictive performance among the compared models.

These results demonstrate that the proposed transfer learning mechanism effectively leverages shared patterns across crime types to improve prediction robustness within the studied urban context, especially for categories with sparse records.

6 Conclusion

Spatio-temporal crime prediction differs from traditional spatio-temporal predictions of traffic flow, temperature, and air quality due to the challenge of data sparsity. Considering the co-occurrence phenomenon among different types of crime, an adaptive transfer learning training approach is adopted to fully leverage crime data and mitigate data sparsity. The proposed approach facilitates feature sharing among different types of crime, enabling a comprehensive exploration of underlying relationships. This enhances the model's ability to recognize potential cross-type spatio-temporal correlations and improves its prediction performance and robustness when dealing with sparse and heterogeneous crime data. In practical applications, the proposed model can help police departments implement more targeted patrol strategies—for example, by allocating more resources to high-risk regions predicted for sparse crime types (e.g., rape and gambling) that are often overlooked, thereby improving the overall efficiency of urban safety management. In summary, we can draw the following conclusions:

(1) The study introduces a transfer learning framework based on ConvLSTM. When predicting a specific type of crime, this framework extracts features from other types of crimes and incorporates them into the training process. To balance the model's attention toward distribution discrepancy losses across different types of crimes, an adaptive weight updating method that utilizes the rate of change of distribution discrepancy losses is employed. Experimental results show that the proposed approach can extract underlying relationships among different types of crime and enhance the model's prediction performance. Ablation results confirm Transfer Learning's critical role in alleviating data sparsity—removing it increases MAE by 29.6% for rape and 22.1% for gambling compared to the full model. Removing adaptive weight updating further leads to 11.9–13.6% higher MAE than the full model, as it fails to balance learning across crime types with varying sample sizes.
(2) The surrounding environment can influence the distribution of crimes. Building upon the consideration of meteorological data, we introduced the distribution of POIs and other features (e.g., population data) and analyzed the model's ability to perceive these features. Experimental results show that compared to the scenario that only considers weather features, the predictive performance for burglary, gambling, and drug crimes significantly improves when incorporating POI and population data. This further confirms the substantial impact of POI and other features on these specific types of crimes. POI features drive more pronounced improvements than weather—drug crimes show a 14.5% lower MAE when adding POIs to weather, while assault and burglary benefit more from weather, with an 8.3–10.1% MAE reduction vs. no environmental features, reflecting their different correlations with human activity and mobility.
(3) Compared to traditional spatio-temporal prediction models, our proposed approach achieves superior predictive performance across all crime types and effectively captures features among different types of crimes. More importantly, it demonstrates a robust capability to handle the dual challenge inherent to urban crime prediction: leveraging sparse data through cross-type knowledge transfer and integrating heterogeneous urban and environmental features to uncover complex, crime-specific patterns.
(4) While the proposed CATL model demonstrates effectiveness in mitigating data sparsity within the studied urban environment, this study has limitations that suggest directions for future work. Our evaluation was conducted on data from a single city and at a specific granularity. Thus, while the model shows strong robustness in handling intra-city data sparsity and cross-type correlations, its generalizability to cities with different geographic, demographic, and crime patterns requires further validation.

Future research will focus on three specific directions: first, cross-city evaluation using datasets from cities with varying geographic conditions such as coastal areas, inland regions, and diverse topographies, as well as demographic characteristics including differing population densities to verify generalizability; second, testing on finer-grained data such as hourly intervals or 1 km × 1 km grids while developing targeted strategies to address extreme sparsity; and third, integrating domain knowledge from criminal psychology and urban planning into the transfer learning framework to further enhance prediction accuracy.

Statements

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

CC: Software, Formal analysis, Writing – original draft, Conceptualization. ZZ: Writing – review & editing, Methodology, Project administration. HD: Investigation, Validation, Writing – review & editing, Funding acquisition. WW: Data curation, Visualization, Resources, Writing – review & editing.

Funding

The author(s) declared that financial support was received for this work and/or its publication. This work was supported by National Natural Science Foundation of China (No. 42471283), Zhejiang Provincial “Jianbing Lingyan + x” Science and Technology Program (No. 2025C01030), and Public Welfare Technology and Industry Project of Zhejiang Provincial Science Technology Department (No. LGF21F020006).

Conflict of interest

WW was employed by Zhejiang SUPCON Information Co., Ltd.

The remaining author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declared that generative AI was not used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1
AlvesL. G.RibeiroH. V.RodriguesF. A. (2018). Crime prediction through urban metrics and statistical learning. Phys. Stat. Mech. Appl.505, 435–443. doi: 10.1016/j.physa.2018.03.084
- CrossRef
- Google Scholar
2
AndresenM. A.MallesonN. (2011). Testing the stability of crime patterns: implications for theory and policy. J. Res. Crime Delinquency48, 58–82. doi: 10.1177/0022427810384136
- CrossRef
- Google Scholar
3
BowersK. J.JohnsonS. D.PeaseK. (2004). Prospective hot-spotting: the future of crime mapping?Br. J. Criminol.44, 641–658. doi: 10.1093/bjc/azh036
- CrossRef
- Google Scholar
4
CalatayudJ.JornetM.MateuJ. (2023). Modeling noisy time-series data of crime with stochastic differential equations. Stochastic Environ. Res. Risk Assess.37, 1053–1066. doi: 10.1007/s00477-022-02334-8
5
ChaineyS.RatcliffeJ. (2013). GIS and Crime Mapping. Hoboken, NJ: John Wiley and Sons, 320p.
- Google Scholar
6
ChenB.GuoW.TangR.XinX.DingY.HeX.et al. (2020). “TGCN: tag graph convolutional network for tag-aware recommendation,” in Proceedings of the 29th ACM International Conference on Information and Knowledge Management, October 19–23, 2020, Virtual, USA (New York, NY: ACM), 155–164. doi: 10.1145/3340531.3411927
- CrossRef
- Google Scholar
7
DengM.TanX. Y.ChenK. Q.LiuB. J.ZhaoZ. Y.TuY. J.et al. (2025). Predicting crowd flows via compressed sensing with spatial heterogeneity: an efficient geoai framework. Int. J. Geograph. Inf. Sci.8, 1–32. doi: 10.1080/13658816.2025.2541193
- CrossRef
- Google Scholar
8
DengY.HeR.LiuY. (2023). Crime risk prediction incorporating geographical spatiotemporal dependency into machine learning models. Inf. Sci.646:119414. doi: 10.1016/j.ins.2023.119414
- CrossRef
- Google Scholar
9
DeyR.SalemF. M. (2017). “Gate-variants of gated recurrent unit (GRU) neural networks,” in Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), August 6–9, 2017, Boston, MA (Piscataway: IEEE). doi: 10.1109/MWSCAS.2017.8053243
- CrossRef
- Google Scholar
10
D'OrsognaM. R.PercM. (2015). Statistical physics of crime: a review. Phys. Life Rev.12, 1–21. doi: 10.1016/j.plrev.2014.11.001
11
DuanL.HuT.ChengE.ZhuJ.GaoC. (2017). “Deep convolutional neural networks for spatiotemporal crime prediction,” in Proceedings of the International Conference on Information and Knowledge Engineering (IKE), July 10–13, 2017, Las Vegas, NV (Los Alamitos: IEEE Computer Society), 61–67. Available online at: https://api.semanticscholar.org/CorpusID:43937446 (Accessed January 1, 2026).
- Google Scholar
12
GaoS. (2021). Geospatial Artificial Intelligence (GeoAI). New York, NY: Oxford University Press. doi: 10.1093/obo/9780199874002-0228
- CrossRef
- Google Scholar
13
GaoY.YinD.ZhaoX.WangY.HuangY. (2022). Prediction of telecommunication network fraud crime based on regression-LSTM model. Wireless Commun. Mobile Comput.2022:3151563. doi: 10.1155/2022/3151563
- CrossRef
- Google Scholar
14
GopalakrishnanK.KhaitanS. K.ChoudharyA.AgrawalA. (2017). Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construct. Build. Mater.157, 322–330. doi: 10.1016/j.conbuildmat.2017.09.110
- CrossRef
- Google Scholar
15
GravesA. (2012). Long Short-Term Memory. Supervised Sequence Labelling with Recurrent Neural Networks. Berlin, Heidelberg: Springer Berlin Heidelberg, 37–45. doi: 10.1007/978-3-642-24797-2_4
- CrossRef
- Google Scholar
16
HanX.HuX.WuH.ShenB.WuJ. (2020). Risk prediction of theft crimes in urban communities: an integrated model of LSTM and ST-GCN. IEEE Access8, 217222–217230. doi: 10.1109/ACCESS.2020.3041924
- CrossRef
- Google Scholar
17
HartR.PedersenW.SkardhamarT. (2022). Blowing in the wind? Testing the effect of weather on the spatial distribution of crime using generalized additive models. Crime Sci.11:9. doi: 10.1186/s40163-022-00171-2
18
HuangX.ChengX.GengQ.CaoB.ZhouD.WangP.et al. (2018). “The apolloscape dataset for autonomous driving,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, June 18–23, 2018, Salt Lake City, UT (Piscataway: IEEE), 954–960. doi: 10.1109/CVPRW.2018.00141
- CrossRef
- Google Scholar
19
JiangJ.HanC.ZhaoW. X.WangJ. (2023). “Pdformer: propagation delay-aware dynamic long-range transformer for traffic flow prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, February 7–14, 2023, Washington, DC (Palo Alto: AAAI Press).
- Google Scholar
20
JohnsonS. D.BowersK. J.BirksD. J.PeaseK. (2009). “Predictive mapping of crime by ProMap: accuracy, units of analysis, and the environmental backcloth,” in Putting Crime in Its Place: Units of Analysis in Geographic Criminology, eds. D. Weisburd and T. McEwen (New York, NY: Springer), 171–198. doi: 10.1007/978-0-387-09688-9_8
- CrossRef
- Google Scholar
21
KalinicM.KrispJ. M. (2018). “Kernel density estimation (KDE) vs. hot-spot analysis–detecting criminal hot spots in the City of San Francisco,” in Proceedings of the 21st Conference on Geo-information Science, June 12–15, 2018, Lund, Sweden (Berlin: Springer).
- Google Scholar
22
KhosraviniaP.PerumalT.ZarrinJ. (2023). Enhancing road safety through accurate detection of hazardous driving behaviors with graph convolutional recurrent networks. IEEE Access11, 52983–52995. doi: 10.1109/ACCESS.2023.3280473
- CrossRef
- Google Scholar
23
KidnerD.HiggsG.WhiteS. (2002). Socio-economic Applications of Geographic Information Science. Boca Raton, FL: CRC Press, 289p. doi: 10.1201/b12606
- CrossRef
- Google Scholar
24
KimS.LeeS. (2023). Nonlinear relationships and interaction effects of an urban environment on crime incidence: application of urban big data and an interpretable machine learning method. Sustain. Cities Soc.91:104419. doi: 10.1016/j.scs.2023.104419
- CrossRef
- Google Scholar
25
KumarM.AthulyaS.MinuM.VinodiniV.Aiswaria LakshmiK. G.AnjanaS.et al. (2018). “Forecasting of annual crime rate in India: a case study,” in Proceedings of the 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), September 19–22, 2018, Bangalore, India (Piscataway: IEEE), 2087–2092. doi: 10.1109/ICACCI.2018.8554422
- CrossRef
- Google Scholar
26
LiY.YuR.ShahabiC.LiuY. (2017). Diffusion convolutional recurrent neural network: data-driven traffic forecasting. arXiv [Preprint]. 1707.01926.
- Google Scholar
27
LiuL.LiW. J.LiW. W.YangH. J.JiC.LiR. P.et al. (2018a). Comparison of random forest algorithm and space-time kernel density mapping for crime hotspot prediction. Progress Geogr.37, 761–771. doi: 10.18306/dlkxjz.2018.06.003
- CrossRef
- Google Scholar
28
LiuL.DuF.SongG.LongD.JiangC.XiaoL.et al. (2018b). Detecting and characterizing symbiotic clusters of crime. Sci. Geogr. Sin. 38, 1199–1209. doi: 10.1007/s11442-018-1520-y
- CrossRef
- Google Scholar
29
LiuL.JiJ.SongG.SongG.LiaoW.YuH.et al. (2019). Hotspot prediction of public property crime based on spatial differentiation of crime and built environment. J. Geo Inf. Sci. 21, 1655–1668. doi: 10.12082/dqxxkx.2019.190358
- CrossRef
- Google Scholar
30
MohlerG. O.ShortM. B.BrantinghamP. J.SchoenbergF. P.TitaG. E. (2011). Self-exciting point process modeling of crime. J. Am. Stat. Assoc.106, 100–108. doi: 10.1198/jasa.2011.ap09546
- CrossRef
- Google Scholar
31
RatcliffeJ. H. (2004). Geocoding crime and a first estimate of a minimum acceptable hit rate. Int. J. Geograph. Inf. Sci.18, 61–72. doi: 10.1080/13658810310001596076
- CrossRef
- Google Scholar
32
RuderS.PetersM. E.SwayamdiptaS.WolfT. (2019). “Transfer learning in natural language processing,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorials, June, 2019, Minneapolis, MN, eds. A. Sarkar and M. Strube (Stroudsburg: Association for Computational Linguistics), 15–18. doi: 10.18653/v1/N19-5004
- CrossRef
- Google Scholar
33
ShahmoradiN.AlesheikhA. A.JafariA.LotfataA. (2025). Hybrid ST-ResNet and LSTM approach for precise crime hotspot prediction. Sci. Rep.15:40754. doi: 10.1038/s41598-025-24559-7
34
ShinH.-C.RothH. R.GaoM.LuL.XuZ.NoguesI.et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging35, 1285–1298. doi: 10.1109/TMI.2016.2528162
35
SunY.ChenT.YinH. (2023). Spatial-temporal meta-path guided explainable crime prediction. World Wide Web26, 2237–2263. doi: 10.1007/s11280-023-01137-3
- CrossRef
- Google Scholar
36
TaddyM. A. (2010). Autoregressive mixture models for dynamic spatial Poisson processes: application to tracking intensity of violent crime. J. Am. Stat. Assoc.105, 1403–1417. doi: 10.1198/jasa.2010.ap09655
- CrossRef
- Google Scholar
37
TippingS.WardleH.PryceR. (2025). The association between increasing levels of gambling harm and emotional health outcomes for individuals who are below the threshold of disordered gambling: a secondary analysis of health data. Public Health247:105899. doi: 10.1016/j.puhe.2025.105899
38
TownsleyM.HomelR.ChaselingJ. (2003). Infectious Burglaries. a test of the near repeat hypothesis. Br. J. Criminol.43, 615–633. doi: 10.1093/bjc/43.3.615
- CrossRef
- Google Scholar
39
WangB.LuoX.ZhangF.YuanB.BertozziA. L.BrantinghamP. J.et al. (2018). Graph-based deep modeling and real time forecasting of sparse spatio-temporal data. arXiv [Preprint]. doi: 10.48550/arXiv.1804.00684 (Accessed January 1, 2026).
- CrossRef
- Google Scholar
40
WangY.GeL.LiS.ChangF. (2020). “Deep temporal multi-graph convolutional network for crime prediction,” in Proceedings of the International Conference on Conceptual Modeling, October 19–22, 2020, Vienna, Austria (Berlin: Springer), 543–557.
- Google Scholar
41
WeisburdD. (2015). The law of crime concentration and the criminology of place. Criminology53, 133–157. doi: 10.1111/1745-9125.12070
- CrossRef
- Google Scholar
42
WuJ.AbrarS. M.AwasthiN.Frías-MartínezV. (2023). Auditing the fairness of place-based crime prediction models implemented with deep learning approaches. Comput. Environ. Urban Syst.102:101967. doi: 10.1016/j.compenvurbsys.2023.101967
- CrossRef
- Google Scholar
43
YanJ.HouM. (2020). Research on time series prediction of theft crime based on LSTM network. Data Anal. Knowl. Discov.4, 84–91. doi: 10.11925/infotech.2096-3467.2020.0536
- CrossRef
- Google Scholar
44
YuB.YinH.ZhuZ. (2018). “Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,” in Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), July 13–19, 2018, Stockholm, Sweden (Pasadena: IJCAI Press), 3634–3640. doi: 10.24963/ijcai.2018/505
- CrossRef
- Google Scholar
45
YuanZ.ZhouX.YangT. (2018). “Hetero-ConvLSTM: a deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 19–23, 2018, London, UK (New York, NY: ACM). doi: 10.1145/3219819.3219922
- CrossRef
- Google Scholar
46
ZhangX.LiuL.LanM.SongG.XiaoL.ChenJ.et al. (2022). Interpretable machine learning models for crime prediction. Comput. Environ. Urban Syst.94:101789. doi: 10.1016/j.compenvurbsys.2022.101789
- CrossRef
- Google Scholar
47
ZhangY.ChengT. (2020). Graph deep learning model for network-based predictive hotspot mapping of sparse spatio-temporal events. Comput. Environ. Urban Syst.79:101403. doi: 10.1016/j.compenvurbsys.2019.101403
- CrossRef
- Google Scholar
48
ZhaoX.FanW.LiuH.TangJ. (2022). “Multi-type urban crime prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, February 22–March 1, 2022, Virtual (Palo Alto: AAAI Press), 4388–4396. doi: 10.1609/aaai.v36i4.20360
- CrossRef
- Google Scholar
49
ZhaoX.TangJ. (2017). “Modeling temporal-spatial correlations for crime prediction,” in Proceedings of the 2017 ACM Conference on Information and Knowledge Management, November 6–10, 2017, Singapore (New York, NY: ACM), 497–506. doi: 10.1145/3132847.3133024
- CrossRef
- Google Scholar

Summary

Keywords

adaptive weight updating, co-occurrence phenomenon of crimes, spatio-temporal crime prediction, transfer learning, urban crime

Citation

Cui C, Zheng Z, Du H and Wang W (2026) Dynamic transfer learning with co-occurrence-guided multi-source fusion for urban spatio-temporal crime prediction. Front. Big Data 9:1697392. doi: 10.3389/fdata.2026.1697392

Received

02 September 2025

Revised

12 December 2025

Accepted

09 January 2026

Published

05 February 2026

Volume

9 - 2026

Edited by

Bruno Lepri, Bruno Kessler Foundation (FBK), Italy

Reviewed by

Surapati Pramanik, Nandalal Ghosh B.T. College, India

Senzhang Wang, Central South University, China

R. Tamilkodi, Godavari Institute of Engineering and Technology, India

Ariadna Albors Zumel, University of Trento, Italy

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Ziwan Zheng, zhengziwa@zjjcxy.cn

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Data Analytics for Social Impact

ORIGINAL RESEARCH article

Dynamic transfer learning with co-occurrence-guided multi-source fusion for urban spatio-temporal crime prediction

Abstract

1 Introduction

1.1 Challenge 1 (addressing the sparsity of spatio-temporal crime data)

1.2 Challenge 2 (modeling cross-type temporal–spatial correlation adequately)