Data-integration of opportunistic species observations into hierarchical modeling frameworks improves spatial predictions for urban red squirrels

The prevailing trend of increasing urbanization and habitat fragmentation makes knowledge of species’ habitat requirements and distribution a crucial factor in conservation and urban planning. Species distribution models (SDMs) offer powerful toolboxes for discriminating the underlying environmental factors driving habitat suitability. Nevertheless, challenges in SDMs emerge if multiple data sets - often sampled with different intention and therefore sampling scheme – can complement each other and increase predictive accuracy. Here, we investigate the potential of using recent data integration techniques to model potential habitat and movement corridors for Eurasian red squirrels (Sciurus vulgaris), in an urban area. We constructed hierarchical models integrating data sets of different quality stemming from unstructured on one side and semi-structured wildlife observation campaigns on the other side in a combined likelihood approach and compared the results to modeling techniques based on only one data source - wherein all models were fit with the same selection of environmental variables. Our study highlights the increasing importance of considering multiple data sets for SDMs to enhance their predictive performance. We finally used Circuitscape (version 4.0.5) on the most robust SDM to delineate suitable movement corridors for red squirrels as a basis for planning road mortality mitigation measures. Our results indicate that even though red squirrels are common, urban habitats are rather small and partially lack connectivity along natural connectivity corridors in Berlin. Thus, additional fragmentation could bring the species closer to its limit to persist in urban environments, where our results can act as a template for conservation and management implications.

The prevailing trend of increasing urbanization and habitat fragmentation makes knowledge of species' habitat requirements and distribution a crucial factor in conservation and urban planning. Species distribution models (SDMs) offer powerful toolboxes for discriminating the underlying environmental factors driving habitat suitability. Nevertheless, challenges in SDMs emerge if multiple data sets -often sampled with different intention and therefore sampling scheme -can complement each other and increase predictive accuracy. Here, we investigate the potential of using recent data integration techniques to model potential habitat and movement corridors for Eurasian red squirrels (Sciurus vulgaris), in an urban area. We constructed hierarchical models integrating data sets of different quality stemming from unstructured on one side and semi-structured wildlife observation campaigns on the other side in a combined likelihood approach and compared the results to modeling techniques based on only one data source -wherein all models were fit with the same selection of environmental variables. Our study highlights the increasing importance of considering multiple data sets for SDMs to enhance their predictive performance. We finally used Circuitscape (version 4.0.5) on the most robust SDM to delineate suitable movement corridors for red squirrels as a basis for planning road mortality mitigation measures. Our results indicate that even though red squirrels are common, urban habitats are rather

Introduction
Although urban sprawl is among the main drivers of habitat loss and degradation, cities also offer suitable habitat islands (Angold et al., 2006;Niesner et al., 2021) and can function as stepping stones (Saura et al., 2014;Lynch, 2019), if functional connectivity -exchange of genes, biomass or energy -is prevailing (Baker and Harris, 2007;Braaker et al., 2014). However, urban wildlife faces increasing challenges due to growing anthropogenic pressure and predicting the distribution of species in anthropogenic areas is therefore a crucial factor for urban wildlife conservation and should build the basis for urban planning (Fajardo et al., 2014;Casazza et al., 2021). Species distribution models (SDMs) quantify the relationship between observations of a species and the underlying environmental gradients (Araújo and Guisan, 2006;Guisan et al., 2017), and they encompass occupancy frameworks, machine-learning algorithms and hierarchical modeling frameworks (Elith and Leathwick, 2009;Kéry et al., 2010;Dorazio, 2014;Hefley and Hooten, 2016). Occupancy models can be considered as SDMs, as they allow for the estimation of the probability that an area is occupied by a species based on (environmental) covariates (Guillera-Arroita et al., 2015).
In order to receive accurate predictions by SDMs, it is important to collect many observations, which allow a spatial coverage of the heterogeneous, underlying landscape variables (Phillips et al., 2006a;Guillera-Arroita et al., 2015;Guisan et al., 2017). SDM methods are versatile, but routinely rely on information about sampling locations and protocols to model the species distribution as a function of environmental covariates (Renner et al., 2015;Guisan et al., 2017). Structured sampling, which determines spatial and temporal sampling design for participants, is a formal procedure in distribution modeling that offers high quality data by adding information about the detection process (Royle et al., 2009;Guisan et al., 2017). On the contrary, unstructured presence-only data that has been collected opportunistically is often strongly biased and needs correction before being considered for SDMs (Kramer-Schadt et al., 2013;Guillera-Arroita et al., 2015). However, results of SDMs based solely on unstructured data sets strongly differ in their predictive performance compared to structured designs (Tiago et al., 2017;Kelling et al., 2019;Planillo et al., 2021).
Referring to SDMs, making use of multiple data sources simultaneously is a promising avenue to improve ecological inferences and predictions on dynamics of wildlife populations (Koshkina et al., 2017;Farr et al., 2021). Therefore, data integration frameworks have been developed linking multiple data sources via combined likelihood estimation (Fletcher et al., 2016;Farr et al., 2019). Although unified frameworks remain rare, models using thinned point processes, which remove or retain points according to probabilistic rules, showed superior performance of combining unstructured and structured data sources compared to inferences obtained from single data sources (Dorazio, 2014;Fletcher et al., 2016;Koshkina et al., 2017;Guilbault et al., 2021). A recent hierarchical modeling approach by Renner et al. (2019) utilizes multiple data sources while accounting for overfitting and spatial dependence of observations via combined likelihood maximization.
The presence of multiple data sources is often fueled by the increasing interest of people to participate in citizen science, or, in a wider and more inclusive sense, community science projects (CS, see also: National Audubon Society., 2018), often in urban areas. With wider and more consistent availability of smartphones and internet access, CS projects offer opportunities to create valuable data sets by non-academic scientists, through mobile apps for instance McKinley et al., 2017). Even though structured designs are possible in CS projects (e.g., Sullivan et al., 2009;Louvrier et al., 2021;Planillo et al., 2021), the majority follow unstructured designs (van Strien et al., 2013;Kamp et al., 2016;Arazy and Malkinson, 2021). A common bias in unstructured CS projects is that they follow a gradient toward easily accessible areas, which are sampled intensely by participants, while other areas stay non-sampled (Boakes et al., 2010;Geldmann et al., 2016). Therefore, results of SDMs based solely on CS unstructured projects are potentially distorted and often less accurate. Recent efforts combine CS data with structured sampled data sets (Sullivan et al., 2014;Starkey et al., 2017) and use supplementary sightings to validate models from regularly sampled data sets for example, but also to train model algorithms and boost their overall predictive power in combined modeling frameworks (Fletcher et al., 2019;Renner et al., 2019;Isaac et al., 2020). Here, we assessed the predictive accuracy of combined penalized hierarchical models using semi-structured and unstructured mammal monitoring data from CS projects in the metropolitan area of Berlin, Germany. We selected Eurasian red squirrels (Sciurus vulgaris, henceforth: red squirrel) as our model species given their wide distribution across Berlin, their presence in a range of CS projects, respectively, and their unmistakable appearance, which makes them easy to identify by non-scientists, thus subsequent results could be verified reliably.
Here, we asked how multiple available data sets could be best used for constructing SDMs and how well models would perform if based only on one data source. We hypothesized that hierarchical SDM methods using data-integration techniques would outperform those only considering a single data source. We predict that models based on only one data source show a clear bias to the conditions they were collected in despite bias correction techniques. Furthermore, we hypothesized that more reliable spatial predictions would allow us to determine important movement corridors that maintain connectivity.

Study area
The city of Berlin (52 • 31 N, 13 • 24 E), located in Eastern Germany, is completely surrounded by the federal state Brandenburg, covers an area of 892 km 2 with a population of more than 3.75 million people (Statistical Office for Berlin-Brandenburg., 2021). The green areas of Berlin vary from highly frequented parks in the dense city center to large fragmented forest remnants closer to the administrative border. The built-up area and streets contribute to 58.8% of the surface, greenspaces 12.2%, forest 18.1%, water bodies 6.7 and 4.1% agriculture (Senate Department for the Environment., 2019). Berlin is located in a zone of moderate continental climate with a low mean annual precipitation of 570 mm and mean temperatures of 8 • C-10 • C (Senate Department for Urban Development und Housing., 2018).

Study species
The red squirrel is a small, diurnal mammalian species with a widespread distribution throughout the forests in Europe and northern Asia (Thorington et al., 2012). The International Union for Conservation of Nature lists the species as least concern, but highlights a decreasing population trend, mainly because of loss of natural habitats (Thorington et al., 2012;Turkia et al., 2018). However, red squirrels successfully dwell in urban habitats, where they can yield high abundances (Reher et al., 2016;Hämäläinen et al., 2018Hämäläinen et al., , 2020. Red squirrels are an example of a common urban adapter (Fischer et al., 2015), although urban habitat suitability in the face of climate change and increasing urbanization is still an open question. Red squirrels are appreciated (e.g., fed and taken care of) by many people due to their charismatic appearance (Lurz and Bosch, 2012;Shuttleworth et al., 2015). In our study area red squirrels occur frequently, however without any reliable estimate of the population size. They are accompanied by other typical urban dwellers of all orders, including four mesocarnivores like marten species (Louvrier et al., 2021) and a community of ∼90 bird species with raptors such as northern goshawk (Accipiter gentilis) (Planillo et al., 2020).

Data collection
We analyzed red squirrel observations from two CS projects with different sampling schemes (Figure 1): (1) Camera trap data (n = 669 camera trap locations) formed the semi-structured data set and were collected through a CS project ("Wildtierforscher Berlin"). Camera traps were set up in five phases in private or allotment gardens: autumn 2018, spring and autumn 2019, and spring and autumn 2020. Camera trap locations were selected by dividing the study area into 2 × 2 km grid cells (n = 287) to ensure spatial independence. Camera traps were set up by participants following a strict protocol, i.e., in a corner of the garden approx. 50 cm above ground and pointing to an open area, i.e., the lawn, to capture a wide angle (Louvrier et al., 2021). Each phase consisted of four consecutive weeks, resulting in a binary detection/nondetection matrix of four weeks for each of the five phases ( Table 2).
(2) In parallel, unstructured presence-only data of red squirrel sightings (n = 1450), were collected through the platform "StadtWildTiere Berlin" (SWT) 1 in another CS project, containing only opportunistic observations. The Leibniz Institute for Zoo and Wildlife Research collected and verified sightings by participants since September 2018. Data considered here included all observations until November 2020 and was further pre-processed to reduce bias of inaccurate samples (Appendix A: Presence-only data set pre-processing).

Environmental data
Environmental and anthropogenic covariates considered as explanatory variables for the occurrence of red squirrels were obtained from public online resources ( Table 1). Selection of covariates was motivated by existing literature and important for red squirrel distribution (Lurz et al., 2005;Kopij, 2014;Krauze-Gryz and Gryz, 2015;Reher et al., 2016;Hämäläinen et al., 2018;Thomas et al., 2018). All covariates Study area with sampling locations of red squirrels. Camera trap surveys (black and orange dots) represent a semi-structured sampling design. Unstructured presence-only data (red dots). Background map of Berlin created with R package d6berlin (Scherer 2021).

Species distribution models
We built five SDMs (M1-M5, Figure 2) with the semistructured and unstructured data sets. (M1) The unstructured data set (presence-only) was fitted to MaxEnt model (Phillips et al., 2006a;M2). A second MaxEnt model was fitted including both the unstructured and the semi-structured data set. (M3) The semi-structured data set was processed in single season occupancy models accounting for imperfect detection (MacKenzie et al., 2002). (M4) A combined likelihood model based on inhomogeneous Poisson-Point-Process models (IPPPM) integrated both data sets, accounted for imperfect detection in both data sets via bias layers and overfitting by adaptive least absolute shrinkage and selection operator (LASSO) penalty and combined both likelihoods via complementary log-log functions (Renner et al., 2019). This model was taken as a reference model to compare the other modeling approaches with. (M5) The same model was also fitted as an area-interaction model (AIM) accounting for spatial dependence of observations (Renner et al., 2019). Models (M1-M5) were fitted with the same full set of environmental covariates subsequently subject to individual model selection processes based on Area Under the Curve (AUC) in MaxEnt models (M1-2); Akaike Information Criterion (AIC) in occupancy models (M3) or Bayesian Information Criterion (BIC) in combined likelihood models (M4-5) respectively. Consequently, the final models vary in their subsets of environmental variables, hence are partially different in model input parameters due to the model selection and fitting process (Appendix B: Model parameters). Furthermore, models vary in resolution, if computational burden required coarser resolutions. If stated, bilinear interpolation was conducted in the package raster (Hijmans, 2020).

MaxEnt models for presence-only data
(M1) The unstructured data set was sampled irregularly across the study area, hence we used a bias file and restricted background for the MaxEnt models to account for differences in sampling effort and area (Kramer-Schadt et al., 2013;Steen et al., 2019). To account for differences in the sampling effort, we assigned raster cells without observations with a value of 0.1 (10% probability of sampling). Accordingly, areas with overlapping observational buffers have higher probability to be sampled by participants corresponding to counted observations . Additionally, we assume areas within a buffer of 500 m around every observation to be sampled by participants (approx. 5 min walking distance; see Planillo et al., 2021), restricting the background from which MaxEnt determines environmental variation (Phillips, 2008). We ran MaxEnt version 3.4.1 (Phillips et al., 2006b), included all environmental variables (resolution: 10m) with the settings as follows: maximum iterations = 2,000, maximum background points = 100,000, replicates = 10 and a logistic model output. The observation points were randomly assigned into 80% training and 20% testing sets for a 5-fold cross-validation to assess model performance and to obtain the AUC value that was used to compared MaxEnt models (Fiedling and Bell, 1997). All analyses were done in R 4.0.3 (R Core Team., 2020), MaxEnt models were built with package dismo . Distance greenspace (m) Distance to closest greenspace Processing: forest included as greenspace with mosaic function from the R-package raster (Hijmans, 2020).
(M2) The second MaxEnt model was run with the previous settings, but with the additional camera trap locations with red squirrel sightings (n = 229). Camera trap locations were not considered to be biased regarding inaccurate samplings and therefore not corrected additionally.
Occupancy model for camera trap data (M3) We fitted single season occupancy models. We chose a static model without colonization or extinction rates, because camera trap locations varied spatially between sampling seasons [see Appendix B: Occupancy model (M3)]. However, seasonal effects were considered as possible covariates on detection and occupancy. We designed 18 candidate models guided by two main hypotheses: 1) Red squirrels are highly dependent on natural resources, especially older tree patches. 2) Red squirrels prefer to avoid areas with increased human disturbance, such as impervious surfaces. We selected the most promising model within a model selection process (Appendix B: Supplementary  Figures 1-3) by AIC corrected for small sample size (AICc, Burnham et al., 2002) and AICc weight (AICcw). Models with AICc > 2 were considered different and models with an AICcw value greater than 0.4 were considered to have more support and relative goodness-of-fit. All occupancy analyses were conducted with unmarked (Fiske and Chandler, 2011). The model with the lowest AICc and highest AICcw was retained for predicting occupancy across the study area. The R-package AICcmodavg (Mazerolle, 2020) was used for a MacKenzie and Bailey goodness-of-fit test, an applied parametric bootstrap approach of 1,000 samples, calculating observed and expected values, a chi-square (X 2 ) test and overdispersion parameters for the most reliable model based on the prior model selection process (MacKenzie et al., 2004;Mazerolle, 2020). In addition to the 10 m resolution of the environmental data, we applied the same set of occupancy candidate models on a coarser grid cell resolution of 500 m to be sure to capture important features of the surroundings as well.
Combined likelihood models (IPPPM + AIM) (M4) IPPPM: For the combined modeling framework considering both data sets, we fitted the model with a combined penalized likelihood presented in Renner et al. (2019).
The assumption of spatial independence of the observation locations was tested via inhomogeneous K-function simulation envelopes (Ripley, 1977;Diggle, 2013) performing Monte-Carlo simulations (n = 10) in R-package spatstat (Baddeley and Turner, 2005). (M5) AIM: A second approach was considered, because presence-only locations indicated minimal spatial dependence. Thus, we fitted an area interaction model (AIM) considering both data sets instead of an IPPPM (Renner et al., 2019) by measuring the overlap of spatial buffers around observation points within a given distance (here: 200m) and fits the likelihood with a maximum pseudo-likelihood (Besag, 1977).
Both models (M4 + M5) were run with the settings of 1,000 model fits with 25 iterations. Detection probabilities were modeled as a complementary log-log function of environmental variables. As penalty function, we used the adaptive lasso penalty pre-supplied with additional information from one prior run with the standard lasso penalty. For the canonical link, the Poisson distribution was selected as the link family and BIC was used for model selection (Schwarz, 1978). To model red squirrel distribution, we considered a smaller set of environmental covariates and used a stepwise selection of input parameters. Due to computational burden, environmental covariates were resampled in a bilinear interpolation into 100 × 100 m grid cells in the raster package (Hijmans, 2020).

Urban habitat connectivity
To identify the most important habitat patches, we used the reference model (M4), delineated patches of 500 × 500 m for connectivity and applied a threshold of MaxKappa after model evaluation with 10,000 random background points in R-package dismo . After applying threshold cut-off values, important habitat patches (n = 384) were assigned and connectivity between these was assessed.
We created a resistance map that relies on the previous habitat suitability map of the reference model and inform this cost layer with additional biological barriers and corridors based on the species' ecology. We extended the pure habitat suitability map with additional ecological knowledge on our target species, because previous work points out that connectivity models that are based on SDMs are established but rather conservative and provide lower values of connectivity, thus rather underestimate flow between patches (Blazquez-Cabrera et al., 2016). Although connectivity measurements that are solely based on habitat suitability are potentially misleading (Scharf et al., 2018), they can be further informed by expert opinion or biological traits (Stuart et al., 2021). First, we inverted the habitat suitability values to assign low resistance values to highly suitable habitats and vice versa (Poor et al., 2012;Stevenson-Holt et al., 2014;Stuart et al., 2021). Second, we assumed all waterbodies and roads with average daily traffic equals or higher than 25,000 vehicles (approx. 17 cars/min) as complete barriers for red squirrel crossings. On the other hand, we assumed all trees, given as a concrete number by the environmental layer, and forest patches as important corridors for connectivity. We assigned the tree raster cells with low resistance values based on the number of trees, even though the underlying habitat suitability was low in those grid cells. We choose a manipulation of the inverted habitat suitability map to gain finer scaled and more appropriate corridors, which would also connect areas that are unsuitable habitats, but potentially important connectivity corridors. All calculations were conducted using Circuitscape 4.0.5 (McRae et al., 2008) with focus on all possible connections between important habitat patches (see Appendix B: Circuitscape).

Results
The semi-structured camera trap data set consisted of 669 sampling locations, of which 192 camera traps had at least one red squirrel detection. Detection rates differed over the five seasons (proportion of camera trap with detection range from 0.24 to 0.39; Table 2). The unstructured presenceonly data set originally covered 1,450 observations, but spatial filtering reduced observations to 800 to be potentially included in SDMs. Underlying environmental gradients differed within the sampling locations regarding the two different sampling schemes with the semi-structured data set being distributed closer to the outskirt areas, while presence-only observations are biased towards urbanized areas (Figure 3).
To test the performance of our reference model (M4), results of other model approaches were assessed in direct comparison ( Figure 5): The IPPPM (M4) corrected the sampling bias individually, which resulted in a very strong bias towards the city centre in the unstructured data set ( Figure 4C) and just a minor bias in the semi-structured data set of camera traps ( Figure 4B). However, results of the inhomogeneous K-function simulation envelopes still highlight minimal spatial clustering, even after spatial filtering. Consequently, area-interaction terms instead of point processes were tested additionally (M5, Appendix B: Supplementary Figure 6-8) and selected based on the K.inhom function of R-package spatstat. The adaptive lasso penalty (0.0545) was applied as a safeguard against overfitting, resulting in a reduced influence of covariates in the global model. Model estimates indicate red squirrels to be negatively related to impervious surfaces (β = −0.221 ± 0.098) and areas with high   focal green capacities (β = −0.167 ± 0.076), but positively with short distance to greenspaces (β = −0.213 ± 0.089), and short distance to tree patches with older trees (β = −0.175 ± 0.076). Areas with higher human population were tended to be avoided (β = −0.07 ± 0.032), Suitable habitats were largely predicted in suburban areas, but also in larger parks, graveyards or housing areas in more urbanized area ( Figure 4A). (M1/M2) Both MaxEnt models showed similar results, but contrary to the IPPPM (M4): highly suitable habitat identified by the reference model was neglected, but urban areas that are unlikely to suit as considerable habitat were assigned with high suitability values (Figures 5A-D). M1 (sighting data; training AUC = 0.777, test AUC = 0.821) and M2 (sightings + camera trap presences; training AUC = 0.717, test AUC = 0.798) related relative probability of occurrence with young trees and impervious surfaces, but differed in their results for the variables "distance to greenspace" or "distance to administrative border" (Appendix B: MaxEnt).   was positively influenced by high focal green capacities, areas with little impervious surfaces and higher human population densities, but negatively influenced by being close to greenspaces ( Figure 5E). However, the global model should be considered with caution as the McKenzie and Bailey goodness-of-fit test showed a high overdispersion with an estimate of c-hat > 4 and a p-value <0.05, indicating the results as highly questionable and not adjustable (Mazerolle, 2020). Comparing this model to the reference model showed a clear neglect of suitable habitats in the urban parts and potential overprediction of suburban areas ( Figure 5F).
The connectivity model based on M4 identified corridor networks in the whole metropolitan area of Berlin, but crucial areas for connectivity are mostly identified in areas with higher urbanization, where resources for connectivity are rare ( Figure 6A). In these areas, the model highlights the importance of urban parks and higher tree densities. After applying a fixed threshold, areas assigned with high importance for connectivity, a few parks and graveyards could be identified as bottlenecks for connectivity. Other areas crucial for connectivity ( Figure 6B) were located mostly at very large streets, where a crossing is only possible at very few intersections (e.g., bridges, tunnels).

Discussion
In this study, we compared SDMs based on single or multiple CS data sets with different observational biases, data quality and sampling scheme. As expected, the integrated model was more likely to estimate red squirrel distribution and highlighted the advantages of considering multiple data sources for SDMs (Isaac et al., 2020). In the following, we first discuss the novelty of the statistical approach integrating CS in an urban context; and then we discuss the findings in the light of urban wildlife conservation and green infrastructure planning.

Advantages of data-integration approaches
Although data-integration techniques in SDMs are relatively recent approaches, their potential has been shown in both, simulation and applied studies: Dorazio (2014) conducted simulation-based comparisons and found even limited additional observations from a second data source to improve SDMs performance. Koshkina et al. (2017) combined occupancy data and presence-only data via IPPPM and evaluated model performance by using simulated data and subsequent application. Fletcher et al. (2016) combined camera trap data with presence-only observations obtained from a CS project, but ignored imperfect detection in the observation process.
In this study, sparse and potentially biased camera trap data alone estimated red squirrel distributions inadequately. After integrating the presence-only observations sampled by participants, we were able to demonstrate red squirrel habitat being present in both, semi-natural and urban habitats, whichto our knowledge -describes urban red squirrel habitats in Berlin adequately and is furthermore in good agreement with previous studies (Kopij, 2009(Kopij, , 2014Reher et al., 2016;Thomas et al., 2018). However, model evaluation and validation for integrated models remains rather challenging , especially if known biases exist or prediction errors are likely influenced by data quantity or quality, hence traditional model assessment tools are partially in contrast to the principle of data-integration itself -to include the maximum amount of available data (Isaac et al., 2020).

Differences in suitability predictions
To understand the differences in SDMs and the potential of data-integration approaches for urban planning, it is important to consider differences in sampling schemes and accompanying biases respectively. While unstructured sightings are often biased towards easily accessible (Geldmann et al., 2016) or biodiversity-rich areas, and not evenly distributed over the sampling area, biases in camera traps designs are not as obvious and easy to account for Kays et al. (2009), Kolowski andForrester (2017). In unstructured sightings, potential bias represents mostly spatial distortion of sampling locations and corrections are well accepted and broadly used. In our case, models based on unstructured data sets neglected suitable habitats and assigned nearly contrary results towards higher number of observations in dense urbanized areas, even though we corrected for sampling biases, as has been shown for other species in urban environments . Surprisingly, results based on the semi-structured data set also failed robust distribution estimation and simplified the mechanisms of underlying environmental drivers too much, hence overestimated habitat suitability in the urban outskirt areas and omitted urban habitats in the center. This is possibly due to the bias of the CS project focusing on private gardens and allotments, which are usually not located within the city center (Louvrier et al., 2021). Additionally, aiming at detecting multiple species could have influenced detection rates noticeably (Meek et al., 2014;Dyson et al., 2019).
However, red squirrels are frequent visitors of private gardens (Baker and Harris, 2007), which compared to natural or semi-natural habitat show higher varieties of plant species and partially increased domestic animal densities (Paker et al., 2014;Louvrier et al., 2021). Hämäläinen et al. (2020) found an increase in red squirrel occurrence in urban areas explained by changes in tree species composition and attraction by bird feeders. On the other hand, Magris and Gurnell (2002) found domestic cats to be the main death cause of red squirrels, also corroborated by Fey et al. (2016). Neglecting species interaction effects hence could further bias the results and decrease reliability of SDMs (Kolowski and Forrester, 2017).
Besides spatial sampling effort, it is important to consider detectability of species in surveys (Dickinson et al., 2010;Dorazio, 2014). While species in urban areas are potentially reported more easily in unstructured surveys because they can be observed with less effort by people, it is potentially challenging to observe them in areas with more vegetation cover and higher trees typical of the outskirt of urban areas (Di Cerbo and Biancardi, 2013). Moreover, red squirrels in urban areas tend to be bolder compared to rural individuals; potentially increasing detection probabilities in urban areas, when surrounded by humans (Uchida et al., 2016(Uchida et al., , 2019Kostrzewa and Krauze-Gryz, 2020).

Suggestions for incorporating CS projects in urban planning
The reference model using a data-integration approach combining both data sets, showed how to leverage unstructured data in SDMs, but higher accuracy of models could also be generated by improved sampling schemes beforehand , for example by including absences, revisitations or transect sampling. Furthermore, sampling locations could be restricted or extended, so that sampling follows the environmental gradients more evenly and represents the background sufficiently. For example, notifications in apps could inform participants about insufficiently sampled areas and ask them to observe these to balance out the sampling design. Nevertheless, there is the risk of integrating too many additional parameters ultimately decreasing the motivation of volunteers (Rotman et al., 2012) and potentially increasing effort for project coordinators as well.
Furthermore, more structured collection of CS project observations in central registers would increase data consistency, as many projects aim for equal species with the same extent (Young et al., 2019). CS data bases, for example the semi-structured eBird project, show that constant and crossborder projects can lead to successful scientific contributions (Sullivan et al., 2009). Walker and Taylor (2017) successfully modeled population changes in migratory bird species based on eBird data, compared it to traditional survey methods and found structured CS projects promising avenues for distribution modeling. The accuracy of SDMs could also be increased by considering other modeling techniques than the ones examined in this study, for example integrating multiple species in joint species distribution models that take information from other species observations into account, or use latent variables to account for additional effects, such as species interactions or missing environmental covariates predictors (Warton et al., 2015). However, in case of multiple data sets collected with an overlapping extent, there is a clear indication that dataintegration approaches still outperform single-source models (Fletcher et al., 2016;Koshkina et al., 2017;Isaac et al., 2020;Fidino et al., 2022).

Urban red squirrel habitats and connectivity
The connectivity model identified important habitat patches and bottlenecks in connectivity as expected, highlighting the importance of green structures in urban areas for connectivity for red squirrels. A major corridor crosses the city along the river Spree, combining several parks as suitable habitat patches and potentially linking the city borders. However, the corridor does not fully follow the river line due to massive impervious construction sites, indicating that urban planning has already lost a huge opportunity for wildlife-friendly urban development. This identified most critical area for connectivity could also be beneficial for other urban wildlife and act as a template for implementing crossing sections or further conservation of existing habitat patches.
While habitat fragmentation is a major threat in natural habitats (Lurz et al., 2005), there are likely other factors causing red squirrel occurrence in smaller urban habitat patches. Previous studies on red squirrel habitats found differing importance of fragmentation in urban areas. While Verbeylen et al. (2003) associated permanent occupation only for patches of at least 5 ha, Koprowski (2005) found fragmentation not necessarily being a problem for red squirrels, if the negative consequences be dampened by multiple fragmented, but high quality habitat patches or by supplementary food. Another survey conducted by Hämäläinen et al. (2018) found red squirrel occurrence even in single trees, hence a clear contrast compared to natural habitats. In general, red squirrels in urban areas tend to travel shorter distances for dispersal (Fey et al., 2016;Selonen et al., 2018), due to energy savings or supplementary food, ultimately decreasing the need for longer dispersal. In this study, the identified critical nodes depend largely on the need to travel between patches, resulting in risk to overestimate their importance for overall connectivity. Surveys of urban red squirrels in Paris, showed viable populations with high genetic variations, hence highlighted urban fragmentation to be less important (Rézouki et al., 2014). However, Thomas et al. (2018) analyzed red squirrel habitats and found lowered sensitivity to fragmentation in urban red squirrels, but highlighted that any further fragmentation would decrease populations, indicating the species being close to its limits. Furthermore, the importance of streets as barriers in connectivity shows contrary results and there is a risk of overestimating their importance; Magris and Gurnell (2002) carried out a survey and found 36% of death causes were attributed to roads, but Fey et al. (2016) showed roads would not influence dispersal movements whilst still being avoided and concluded that streets with infrequent traffic are more dangerous, because squirrels are not used to it.
Here, we used circuit theory, based on electric current flow to analyze red squirrel habitat connectivity, yet many other options of modeling are available, including least-cost path modeling, current flow, factorial least-cost path density, resistant kernels, and randomized shortest path algorithm (for an overview see: Simpkins et al., 2018;Diniz et al., 2020), with different applications. However, omnidirectional methods, such as Circuitscape, often show similar results (Phillips et al., 2021); comparing the results to empirical data showed reliable estimates of connectivity corridors, especially when the species uses random exploration of the underground, for example during dispersal movements (McClure et al., 2016). We did not have access to additional movement data (e.g., collared animals) to identify the true range of movements in Berlin (sensu LaPoint et al., 2015), hence research conducted in that context should be ultimately considered for investigating urban population dynamics of red squirrels."

Conclusion
Combined data-integration approaches are more likely to estimate true biological distribution of species, even if the input data sets lack a structured sampling design. Our study highlights that two common modeling techniques, such as MaxEnt and occupancy, can omit or negate the habitat identified when both techniques are not combined, pointing out the importance of considering multiple data sources for urban planning when conservation decisions should be included. However, dataintegration approaches for ecological studies requires unified frameworks and implementation of advanced statistical tools for model validation. Applying the data integration approach, we identified critical hotspots for red squirrels in Berlin and delineated an important corridor bridging the forests outside the urban area. In this study, we showed how data integration approaches could be used as a tool for combining multiple CS projects that often depend on different sampling schemes and efforts.

Data availability statement
Original datasets are available in a publicly accessible repository: These data can be found here: https://github.com/ EcoDynIZW/Grabow_2022_FrontEcolEvol.

Ethics statement
The animal study represents a non-invasive monitoring method of free ranging animals without any impact and thus does not require approval or revision.

Author contributions
MG, JL, AP, and SK-S conceived the ideas and designed methodology. SKie, KB, MS, RH, and SKim collected the data and led the citizen science campaign. MG, JL, and AP analyzed the data. SD and TS supported project supervision. MG and SK-S led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.

Funding
MG, SD, and SK-S are associated with the DFG Research Training Group "BioMove" (DFG-GRK 2118/2). JL was supported by funds from DAAD (Leibniz-DAAD Research Fellowships, 2018-57423756), I.Z.W. and by an IPODI grant from the TU Berlin. SKie and AP were supported by the German Federal Ministry of Education and Research BMBF within the Collaborative Project "Bridging in Biodiversity Science-BIBS" (funding no. 01LC1501). RH, KB, and SKim were supported by the BMBF funded project WTImpact (funding no. 01IO1725). This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) -project no. 491292795.