Associations Between the Density of Oil and Gas Infrastructure and the Incidence, Stage and Outcomes of Solid Tumours: A Population-Based Geographic Analysis

Background We hypothesized that there are geographic areas of increased cancer incidence in Alberta, and that these are associated with high densities of oil and gas(O+G) infrastructure. Our objective was to describe the relationship between O+G infrastructure and incidence of solid tumours on a population level. Methods We analyzed all patients >=18 years old with urological, breast, upper GI, colorectal, head and neck, hepatobiliary, lung, melanoma, and prostate cancers identified from the Alberta Cancer Registry from 2004-2016. Locations of active and orphan O+G sites were obtained from the Alberta Energy Regulator and Orphan Well Association. Orphan sites have no entity responsible for their maintenance. ArcGIS (ESRI, Toronto, Ontario) was used to calculate the distribution of O+G sites in each census distribution area (DA). Patient residence at diagnosis was defined by postal code. Incidence of cancer per DA was calculated and standardized. Negative binomial regression was done on O+G site density as a categorical variable with cutoffs of 1 and 30 wells/100km2, compared to areas with 0 sites. Results 125,316 patients were identified in the study timeframe;58,243 (46.5%) were female, mean age 65.6 years. Breast (22%) and prostate (19.8%) cancers were most common. Mortality was 36.5% after a median of 30 months follow up (IQR 8.4 – 68.4). For categorical density of active O+G sites, RR was 1.02 for 1-30 sites/100km2 (95% CI=0.95-1.11) and 1.15 for >30 sites/100km2 (p<0.0001, 95%CI=1.11-1.2). For orphan sites, 1-30 sites RR was 1.25 (p<0.0001, 95%CI=1.16-1.36) and 1.01 (p=0.97, 95%CI=0.7-1.45) for >30 sites. For all O+G sites, RR for 1-30 sites was 1.03 (p=0.4328, 95%CI=0.95-1.11) and 1.15 (p<0.0001, 95%CI=1.11-1.2) for >30 sites. Conclusion We report a statistically significant correlation between O+G infrastructure density and solid tumour incidence in Alberta. To our knowledge this is the first population-level study to observe that active and orphan O+G sites are associated with increased risk of solid tumours. This finding may inform policy on remediation and cancer prevention.

level study to observe that active and orphan O+G sites are associated with increased risk of solid tumours. This finding may inform policy on remediation and cancer prevention.
Keywords: oil, solid tumour, epidemiology, gas, Geographical Information System (GIS), geography, environmenttoxicity, population HIGHLIGHTS Question: Is there a relationship between oil and gas production facilities and cancer incidence and severity?
Findings: Population-level geographic study correlating adjusted incidence of solid tumours with density of active and orphan oil and gas facilities. We found a statistically significant association between cancer incidence and site density for almost all tumour types but did not find any association with mortality or distant metastases.

INTRODUCTION
There is increasing recognition of the role of environmental factors in population health. In countries or regions with high oil and gas production such as Canada, this conversation often revolves around petrochemical plants and oil and gas (O+G) infrastructure (1). Oil and gas installations may pose a risk to the health of those who live in close proximity to them (2). However, it is unclear whether living close to these facilities poses a risk for cancer development overall, or whether certain cancer types are more likely to occur.
Several previous studies have noted correlations between residence near or employment at O+G-related sites and increased cancer incidence (3)(4)(5). Taken together, these data suggest that there may be a possible link between proximity to petrochemical sites and cancer incidence. However, these studies are limited by analysis of small patient numbers, a single tumour type or a single industrial site. A population-based analysis of multiple tumour types and large numbers of patients in a single region may allow a more robust assessment of these correlations. Moreover, the associations between tumour stage at diagnosis and density of conventional oil and gas facilities are not well studied. Conventional O+G production refers to the drilling, production, and transportation of subsurface oil and gas, as opposed to oil sands or offshore production. Orphan facilities are those for which there is no corporate or individual entity responsible for their operation or remediation. A difference in cancer outcomes (such as cancer-specific survival) among areas with varying densities of O+G infrastructure has not been demonstrated in the literature. Some studies have suggested possible mechanisms for contamination around O+G sites. Possible routes of contamination identified in the existing literature include air contamination or contamination of groundwater (1,5). We hypothesize that groundwater or air pollution is the most likely mechanism for an association of O+G infrastructure with cancer incidence, although the scope of this study does not encompass identifying this mechanism.
The province of Alberta, Canada, is an excellent study area for such an analysis as both health and O+G related data are available for the same geographic area. We hypothesized that there are areas of Alberta in which a higher incidence of cancer is correlated with increased geographic density of O+G infrastructure. We further hypothesized that facilities that are orphaned or incompletely remediated will have a greater effect on cancer incidence than actively licensed facilities. This study may inform public health efforts and provide information to guide remediation activities in the areas of highest risk.

Study Cohort and Data Sources
This study received ethics approval from the Conjoint Health Research Ethics Board at the University of Calgary. This was a retrospective, population-based geographic analysis incorporating prospectively collected data from the provincial, population-based Alberta Cancer Registry (ACR) and the 2011 Canada census. All adult patients (>= 18 years) who were diagnosed with solid malignant tumours (including breast, colorectal, gastric, lung, pancreatic, head and neck, hepatobiliary, renal, bladder, and prostate cancer) between January 1 st 2004 and January 1 st 2016 in Alberta were included. Patients who did not have a valid healthcare number were excluded. Patients with multiple cancers were included once based on the first incident cancer. Patient demographics (age, sex, and postal code) and tumour characteristics (such as tumour type and stage) were obtained from Alberta Cancer Registry. Cancer treatment data and patient factors such as comorbidity index are prospectively collected in the ACR. The demographics (age, sex, neighbourhood income level and education levels) of the general population during the same timeframe was retrieved from census data. Patient location was defined using the postal code of residence at the time of diagnosis. Location data for active O+G installations was obtained from publicly available data maintained by the Alberta Energy Regulator (AER), and for orphan oil and gas installations by the Alberta Orphan Well Association (OWA). The OWA data file was accessed on March 3, 2019, and the AER data was accessed May 5, 2019. The OWA is an industry-funded body who takes overall responsibility for orphan installations in Alberta. For the purposes of this study, "sites", "facilities" and "installations" were all considered synonymous and refer to all O+G infrastructure.

Statistical Analysis
O+G facility distribution analysis was performed by using the geographic location of each O+G installation provided by the AER and the OWA and plotting these on a base map of Alberta census area polygons obtained from Statistics Canada (Statistics Canada, Ottawa, Ontario). Prior to analysis, the data sources were inspected and nonrelevant well and facility types such as water wells were removed. These census areas are known as Distribution Areas (DAs). The DAs have a consistent population contained within them, but different geographic areas. We used ArcGIS Pro to calculate the geographic density of O+G installations in each DA, as the number of installations/100km 2 . Patient locations were separated by postal code, and these postal codes were superimposed on DAs using the Postal Code Conversion File available from Statistics Canada (Statistics Canada, Ottawa, Ontario).
The crude incidence rate of each cancer in each DA was calculated using the number of cancer cases divided by the population at risk. The adjusted incidence rates for each cancer and for all cancers in each DA were calculated using logistic regression (adjusted for age, sex, neighbourhood income level, and education level). The income level is defined as the mean income in a patient's DA. The cut point used in the regression adjustment is the median income for all cancer patients in the province, which includes our subset. The education level is defined as the proportion of people with high school education or higher in a patient's DA. The cut point used in regression adjustment is 80%, the median value for all cancer patients in the province. Urban vs. rural residence was defined as residence in a municipality with greater than 30,000 population.
Negative binomial regression was performed to determine the association between density of O+G infrastructure and cancer incidence for each DA. The subgroup analysis was conducted for active O+G sites, orphan O+G sites, and total O+G sites, and for each tumour type, respectively. The O+G density (number of O+G facilities/100km 2 ) was categorized into three groups: 0, 1-30, and > 30 O+G facilities/100km 2 . 1-30 facilities/100km 2 was chosen as it encompasses the mean number of facilities per 100km 2 in areas with O+G infrastructure.
Multivariable logistic regression models were constructed to assess the associations between O+G installation density and patients presenting with stage IV cancer for all cancers and for each tumour type individually. In the multivariable logistic regression model, the co-variates included patient age, sex, rural (vs. urban) residence, income, education level, treating institution (academic vs non-academic), healthcare zone (Calgary, Edmonton, North, South and Central) and Charlson comorbidity index.
Survival analysis was performed using multivariable Cox regression to investigate the effect of O+G installation density on overall survival (OS) and cancer specific survival (CSS) for all cancer patients. In the Cox regression model, we adjusted patient age, sex, tumour grade, tumor stage, treatment (e.g. surgery, chemotherapy, radiation, and hormone therapy), rural (vs. urban) residence, income, education level, treating institution (academic vs nonacademic), healthcare zone and Charlson comorbidity index.
Maps were produced using ArcGIS Pro 10.6.1 software (ESRI Canada, Toronto, Ontario). All statistical analyses were performed with SAS version 9.4 (SAS Institute, Inc., Cary, NC).

Patients and Demographics
Patient demographic data are summarized in Table 1. 125,208 cancer patients were included in the study, 46.5% of whom were female. Median age was 66 (IQR=57-75) years. The most common cancers were breast (22%), prostate (19.8%), lung (16.7%), and colorectal (15.7%). Overall, 46.4% of patients died during the follow-up period, of which 36.5% were due to cancer. A total of 27,246 (21.8%) patients were stage IV at diagnosis.

Geographic Distribution of Oil and Gas Installations
There were 4,827 DAs and 487,413 O+G facilities in Alberta at the time of data access, with 5,592 (1.1%) orphan sites and 481,821 (98.9%) active sites. The mean number of O+G facilities/ 100km 2 in Alberta was 40 (Range 0-231, SD=28), with the median being 0 (IQR=0-0). Most of the DAs with the highest density of installations were in the eastern parts of the province ( Figure 1). 3921 (81%) of DAs had no O+G infrastructure within them ( Table 2), the majority of which were in urban areas.

Distribution of Cancer Incidence
Mean adjusted incidence rate of total cancers per DA was 341/ 100,000 population (range 0-2458, SD=183). Median adjusted incidence rate per DA was 307/100,000 population (IQR=224.8-388.8) ( Table 1). There was variation in incidence rate among the DAs, with the highest incidence rates in the eastern parts of the province as displayed in Figure 1.

Association of Cancer Incidence With Oil and Gas Facility Density
Results for the association between O+G facility density and cancer incidence can be found in Table 3. For all O+G installations when density is treated as a categorical value with zero density being the reference, the Incidence Rate Ratio (IRR) was 1.03 (p=0.43, 95% CI=0.95-1.11) and 1.15 (p<0.0001, 95%CI=1.11-1.2) for a DA with 0-30 facilities/ 100km 2 and a DA with >30 facilities/100km 2 , respectively.
The result of subgroup analysis by tumour types showed that increased cancer incidence was associated with higher O+G density (>30 total facilities/100km 2 ). These tumours included breast (

Association of Metastasis at Presentation With Oil and Gas Facility Density
For total, orphan, and active facilities, there were no statistically significant correlations between O+G facility density and metastasis at presentation. ( Table 3).

Association of Survival With Oil and Gas Facility Density
Survival analysis revealed no negative effect of location near >30 O +G facilities/100km 2 on overall or cancer specific survival. For active sites, Hazard Ratio (HR) for overall survival (OS) was 1.0  Table 3). There was no association between O+G facility density and OS or CSS for individual tumour types ( Table 4).

DISCUSSION
To date, several studies have reported on cancer incidence in populations residing near industrial sites in various locations. Ghazawi et al. mapped postal code data of over 18,000 Canadian patients and identified a rate of acute myeloid leukaemia greater than three times that of the national average in Sarnia, Ontario, a  city known for its numerous chemical plants and oil refineries (3). In a meta-analysis, Wong et al. reported increased incidence of skin cancer in some groups of refinery workers in the UK and upstream oil workers in Canada, although no mechanism for this finding was identified by the authors (4). A systematic review published in 2019 identified three studies which showed excess cancer mortality in oil-extracting regions of Ecuador. They further reported a study performed in Colorado which showed that children with acute lymphocytic leukaemia were 4.3 times as likely as controls to reside near active oil and gas wells (2). A study conducted in Alberta, Canada recognized increased levels of 43 Volatile Organic Compounds (VOCs) in the area downwind of a large petrochemical complex, 10 of which are known, probable, or possible carcinogens. They found increased levels of male hematopoietic malignancies in the same geographic area as compared to surrounding municipalities and the entire province (5). This study investigated the possible correlation of high densities of O+G infrastructure with cancer incidence. To our knowledge, this is the first study that reports this correlation on a population level in the context of various common solid tumours and different types (active and orphaned) of conventional oil and gas production. The main finding was that cancer incidence was associated with increased density of O+G infrastructure. It is possible that the larger number of active O+G facilities within a DA increases the potential exposure to industrial carcinogens, and therefore increases cancer incidence. This is in keeping with the findings of other studies.
Notably, residing in near areas with orphan wells at low densities was associated with an elevated risk of cancer. This may be due to a lack of appropriate remediation or adequate abandonment and not being actively maintained by any proprietor. This may result in an increase of environmental contamination and therefore increased risk for nearby inhabitants. There is little direct evidence for this, but previous studies have found an increased risk for contamination from orphan and abandoned wells. Kang et al. in 2014 reported increased methane emissions from abandoned oil and gas wells in Pennsylvania, with some of the highest emitters releasing 3 orders of magnitude higher flow rate of methane than the median flow rate of methane for wells in that area (6). We suspect that the reason for the diminished risk ratio in areas of higher orphan well concentration (>30 facilities/100km 2 ) is due to the low number of areas with these concentrations. The finding that orphan wells have a stronger association than active wells with cancer incidence may point to an effect of increased contamination near orphan sites, although we have not identified a biological mechanism for this association. This difference in incidence rate ratio is particularly pronounced given the much smaller overall numbers of orphan sites in Alberta.
The association between cancer incidence and O+G facility density was robust among most of the solid tumour types captured in our database. The exceptions to these were pancreatic and bladder cancers. This finding is counterintuitive given that these tumours are diverse in terms of their oncogenesis, risk factors, and clinical behaviour. However, there are several studies which have investigated links between exposure to petroleum products and risk of developing solid malignant tumours. These studies have reported increased risk of rectal, skin, renal, gastric, lung, and prostate cancers in people with long-term occupational or residential exposures to petroleum refineries or products (7)(8)(9)(10)(11)(12). These studies, when taken together, suggest a time-dependent risk of oncogenesis in people exposed to hydrocarbons. Studies published by Peters et al. and Kachuri et al. respectively suggested that exposure to diesel and gasoline emissions for periods of greater than ten years would be necessary for increased cancer risk (8,9).
We did not find an association between metastasis at presentation or cancer specific survival and density of O+G infrastructure. This suggests that even though there is an association with the development of cancer, this is not associated with more advanced disease at presentation or worse survival. We assessed stage IV patients separately because we hypothesized that living near O+G infrastructure might be associated with the development of more aggressive cancer phenotypes which might present at more advanced stages. An Australian industry-wide study of more than 18,000 petrochemical workers found an increased incidence of melanoma, mesothelioma, prostate cancers, renal cancers, and leukaemia, but no excess mortality compared to the wider population (13). This lack of association may reflect the impact of O+G exposures on developing cancers but not on the biology or behaviour of the malignancies once established. Assuming standard treatment according to cancer, stage, and individual patient characteristics, it would be expected that outcomes would be similar to unexposed individuals. While it is possible that these patients had ongoing exposures to environmental contaminants during treatment and recovery, these exposures did not hamper the success of their treatments.
The study has limitations. The common challenge for studies using population-based data is identifying or quantifying individual exposures. We could not assess exposure time for individual patients to determine if their duration of habitation in these areas explained or contributed to the differences in cancer incidence. Another limitation is that we are unable to identify the actual contaminants, if any, to which individual patients are exposed. There are multiple possible environmental contaminants to which patients are exposed and we do not have data to identify which of these contaminants, if any, are enriched in these areas. Some of these contaminants are not related to O+G industry activity, such as radon or vehicle exhaust pollution. This also limits our ability to comment on a biological mechanism for the increase in cancer incidence, although previous studies have identified increased air, water, and soil contamination in proximity to O+G extraction sites (2). We are also unable to control for common carcinogenic exposures and control for the possibility that people work in areas other than their primary residence, which would similarly alter their exposures. We feel that these possibilities are somewhat mitigated by our large number of patients and that the correlation was noted among multiple different tumour types. This is particularly true of the association of breast cancer incidence with O+G site density. Only 15% of O+G field workers are women, and therefore if occupational exposures were the main contributor to increased cancer incidence we would not expect breast cancer to be among the affected tumour types (14). Finally, the use of postal code to geolocate patients is imprecise. Some DAs are geographically large and O+G facilities are not uniformly distributed. Therefore, not all people residing in a DA will have the same risk of exposure to O+G infrastructure.
Despite the limitations, this study is one of the first to identify a significant correlation between residence near O+G infrastructure and cancer incidence. A unique feature of this study is that we were able to identify this correlation at a population level capturing all patients diagnosed with the common solid cancers in our province over 12 years. The large number of patients involved also provides strong statistical validity to our observations. Another advantage of this study is that the geographic area covered by the health administration in our province and the energy regulatory authority is identical. This is a situation which is rare if not unique among petroleumproducing areas. This provides the opportunity to use preexisting high-quality geographic and health data to explore associations between petrochemical extraction activities and human health.

CONCLUSION
In conclusion, this population-level geographic analysis identified a correlation between O+G facility density (active or orphaned wells) and solid tumour incidence. There was no association noted with distant metastasis or survival. There are limitations which reduce our ability to identify which contaminants might be responsible or eliminate potential confounders. These findings may inform future studies to identify specific exposure risks from habitation near O+G infrastructure as well as public health efforts aimed at remediation in our and other jurisdictions.

DATA AVAILABILITY STATEMENT
The datasets presented in this article are not readily available because the data is protected by privacy legislation in the country of origin and thus cannot be provided without a data transfer agreement. Requests to access the datasets should be directed to EJ, ejost1@jh.edu.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Health Research Ethics Board of Alberta-Cancer Committee. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
EJ was involved in the conception, design, data analysis, writing, and editing. BD was involved in the conception, design, writing, and editing. CJ was involved in the conception, design, and data analysis. WC was involved in the conception, design, data gathering, and editing. MQ was involved in the conception, design, data analysis, writing, and editing. AB-F was involved in the study conception, design, and editing of the manuscript. SK was involved in the study conception, design, and data analysis. YX is the senior author and was involved in the study conception, design, data analysis, writing, and editing. All authors contributed to the article and approved the submitted version.