Quality indicators: completeness, validity and timeliness of cancer registry data contributing to the European Cancer Information System

Population-based Cancer Registries (PBCRs) are tasked with collecting high-quality data, important for monitoring cancer burden and its trends, planning and evaluating cancer control activities, clinical and epidemiological research and development of health policies. The main indicators to measure data quality are validity, completeness, comparability and timeliness. The aim of this article is to evaluate the quality of PBCRs data collected in the first ENCR-JRC data call, dated 2015. Methods All malignant tumours, except skin non-melanoma, and in situ and uncertain behaviour of bladder were obtained from 130 European general PBCRs for patients older than 19 years. Proportion of cases with death certificate only (DCO%), proportion of cases with unknown primary site (PSU%), proportion of microscopically verified cases (MV%), mortality to incidence (M:I) ratio, proportion of cases with unspecified morphology (UM%) and the median of the difference between the registration date and the incidence date were computed by sex, age group, cancer site, period and PBCR. Results A total of 28,776,562 cases from 130 PBCRs, operating in 30 European countries were included in the analysis. The quality of incidence data reported by PBCRs has been improving across the study period. Data quality is worse for the oldest age groups and for cancer sites with poor survival. No differences were found between males and females. High variability in data quality was detected across European PBCRs. Conclusion the results reported in this paper are to be interpreted as the baseline for monitoring PBCRs data quality indicators in Europe along time.


Introduction
Population-based Cancer Registries (PBCRs) are tasked with collecting high-quality data, important for monitoring cancer burden and its trends, planning and evaluating cancer control activities, clinical and epidemiological research and developing of health policies (1). Therefore, the value of a PBCR is inherent in the quality of its data and the related quality control measures. The main indicators to measure data quality are validity, completeness, comparability and timeliness (2,3).
Validity or accuracy refers to the proportion of cases with specific characteristics that actually have such attribute. Completeness indicates the extent of which all incident cancer cases in the area covered by the PBCR are indeed recorded by the PBCR. Comparability is the adherence to common international guidelines. Timeliness refers to how quickly cancer incidence data is collected, processed and reported. There is usually a trade-off between timeliness and both completeness and validity. Cancer data quality indicators include proportion of cases with death certificate only (DCO%), the proportion of microscopically verified cases (MV%) and the mortality to incidence (M:I) ratio (2)(3)(4).
The European Network of Cancer Registries (ENCR) has been operating since 1990 to support the collaboration among European PBCRs. One of the ENCR main aims is the improvement of the quality and comparability of cancer incidence data. The ENCR Secretariat has been hosted in Ispra, Italy, since 2012 by the Directorate-General Joint Research Centre (JRC), the science and knowledge centre of the European Commission. The JRC supports the ENCR in the harmonisation of PBCR data, with the goal of accurately comparing data between European areas (5).
In 2015 a first ENCR-JRC data call was launched by the ENCR Steering Committee and the JRC to the European PBCRs (6). After harmonisation, EU-wide statistics on incidence and mortality by cancer site, sex, age group and PBCR have been computed, feeding the European Cancer Information System (ECIS) as the web tool developed and maintained by the JRC to report on the burden of cancer in EU and Europe (7).
The goal of this study is to evaluate the quality of PBCRs data collected in the first ENCR-JRC data call, dated 2015, and is based on indicators evaluating completeness, validity and timeliness as data quality dimensions.  Table 1) were selected for patients older than 19 years. Data quality in children and adolescents will be analysed in a separate publication, since for this age group tumours are grouped taking into account morphology and topography combinations according the International Classification of Childhood Cancer and also have different definitions of unspecified morphology compared to adults (8).
All malignant tumours (ICD-O-3.1 behaviour = 3), except skin non-melanoma, and in situ and uncertain behaviour (ICD-O-3.1 behaviour 2 and 1 respectively) of bladder were included in the analysis.
Among others, the 2015 data call protocol (9) included the following variables: topography, morphology and behaviour, coded according to the International Classification of Diseases for Oncology, Third Edition, (ICD-O-3) (10), as well as basis of diagnosis.
Patients with the same patient identification code and tumour identification code were checked, and excluded from the analysis if other variables such as topography, morphology and behaviour were also duplicated.
Cancer sites were defined with ICD-O-3 topography and morphology combinations reported in Supplementary Table 2.

Quality indicators
Validity, completeness, and timeliness of the PBCRs datasets were evaluated. The following indicators were calculated, with type of indicator specified in italics between parentheses (2, 3): • DCO% (validity).
Benchmarks for the latest available period (2010-2014) were computed for the first tertile (30%) of PBCRs with the higher performance for each indicator. Two-sided 95% confidence intervals were calculated using the Clopper-Pearson method for DCO%, MV%, UM%, PSU%, with a ratio paired t-test for M:I ratio and with the normal approximation method for the timeliness indicator.

Results
A total of 28,776,562 cases from 130 PBCRs, 21 National and 109 regional PBCRs, operating in 30 European countries were included in the analysis ( Figure 1). MV%, PSU% and UM% were computed for all 130 PBCRs, DCO % for 102 PBCRs which had access to death certificate information, M:I ratio for 92 PBCRs with available mortality data, and timeliness for the 49 PBCRs which provided date of registration. Table 1 includes DCO%, MV%, UM%, M:I ratio by age at diagnosis and cancer site for the period 1995-2014 and timeliness for the period 2000-2014. Results by period for timeliness ( Figure 7; Table 1

Proportion of cases with death certificate only (DCO%)
The highest DCO% was recorded for liver, pancreas cancer and unknown primary site cases, followed by other haematological malignancies, stomach cancer, brain and central nervous system tumours and lung cancer. The lowest DCO% occurred for testicular cancer, skin melanoma and cervical cancer ( Figure 2). When comparing different time periods, a decrease in DCO% was observed over time for all cancer sites, except PSU cases, changing on average from 4.9% in the period 1995-1999 to 3.0% in the period 2010-2014 ( Figure 2). In particular, between 1995-1999 and 2010-2014 DCO cases decreased on average from 15.1% to 8.7% for liver, from 10.9% to 7.8% for pancreatic cancer and from 7.9% to 4.5% for stomach respectively ( Figure 2).
The DCO% for all PBCRs and all cancer sites combined did not show any difference between males (3.8%) and females (4.0% -data not shown).
Considering the whole analysed period, an increase in DCO% was observed with increasing age, from 1.4% in patients aged 20-59 years at diagnosis, up to 9.4% for those aged 80 and more. Differences by age group were found for most cancer sites. In particular, age group 20-59 and 80+ had a respective DCO% of 8.1% vs 17.2% for liver, 5.3% vs 15.1% for pancreas, 3.3% vs 14.9% for central nervous system and 1.3% vs 12.1% for ovary (Table 1; Supplementary  Figure 1). There was a high variability among PBCRs for this indicator. Whereas the majority of PBCRs had less than 5% DCO cases between 1995 and 2014, 25 out of 102 PBCRs had more than 5% DCOs in at least one of the considered 5-year periods. However, the latter group of PBCRs showed a general improvement for this indicator between 1995 and 2014 (Supplementary Figure 2).

Proportion of microscopically verified cases (MV%)
The lowest MV% occurred for hepatic and pancreatic cancer, followed by lung and central nervous system. The highest MV% was observed for lip and oral cancers, larynx, melanoma, female breast cancer, cancer of the cervix and uterus, testis, thyroid and Hodgkin and non-Hodgkin lymphoma ( Figure 3). MV% increased over time across cancer sites, from an average 81% in the period 1995-1999, to 88% for the period 2010-2014.   Proportion of cases with death certificate only (DCO%) by period of diagnosis and cancer site, 1995-2014.   Proportion of cases with unknown primary site/primary site uncertain (PSU%) by age group and period of diagnosis, 1995-2014. Mortality to incidence (M:I) ratio by period of diagnosis and cancer site, 1995-2014. The biggest improvement between 1995-1999 and 2010-2014 was observed for pancreas (43% and 60% respectively), stomach (80% and 91% respectively) and oesophagus (83% and 92% respectively) (Figure 3). The MV% was similar for males and females (85% and 86% respectively -data not shown).
As for DCO%, a high variability among PBCRs was found, although 117 out of 128 PBCRs had an overall MV% of at least 80% in the latest available period of incidence, MV% increased for most PBCRs between 1995-1999 and 2010-2014 (Supplementary Figure 4).

Proportion of cases with unspecified morphology (UM%)
The highest UM% was found for non-Hodgkin lymphoma, mainly in period 1995-2004, primary site unknown, pancreas and liver and the lowest was found for testis, thyroid, uterus and lip, oral cavity and pharynx.
The UM% was 11% for both males and females (data not shown).
A high variability in UM% was observed among PBCRs, although 112 out of 130 PBCRs had an overall UM% below 20% in the latest available period of incidence. As for previously considered indicators, an improvement occurred for most PBCRs between incidence years 1995-1999 and 2010-2014 (Supplementary Figure 6).

Proportion of cases with unknown primary site/primary site uncertain (PSU%)
The PSU% was 3% for both males and females (data not shown). As far as this dimension is considered, data quality decreased with increasing age (Figure 5).
Similarly to the other indicators presented above, PSU% improved over time for all age groups ( Figure 5).
All 130 PBCRs had less than 5% PSU cases in the latest available period, and the indicator decreased for the majority of PBCRs (Supplementary Figure 7).

Mortality to incidence (M:I) ratio
The highest M:I ratio was observed for hepatic and pancreatic cancer, followed by cancer of the oesophagus, lung and stomach. The lowest ratio was observed for testicular cancer, followed by thyroid and melanoma of the skin (Figure 6).
Overall M:I ratio was 0.53 for males and 0.49 for females in the analysed period (data not shown). The

Timeliness
For the 49 PBCRs with available data, the median time from incidence to registration decreased from 781 to 610 days between incidence years 2000-2004 and 2010-2014 ( Figure 7). This indicator improved particularly for liver (from 1479 to 830 days respectively), thyroid (from 1259 to 723 days) and bladder (from 1184 to 743 days) and remained relatively low throughout incidence years 2000-2014 for oesophagus, melanoma of the skin and cervix uteri (Figure 7).
The median time to registration was lower for younger patients for the majority of cancer sites, for instance, for cervix uteri (384 vs 582 days respectively for age groups 20-59 and 80+ years) and prostate (448 vs 792 days) ( Table 1; Supplementary Figure 10

Discussion
This article gives an overview of data quality among the European PBCRs contributing to the ECIS in the 2015 ENCR-JRC data call. Reference values were computed for the most recently available incidence period (2010-2014) in order to evaluate data quality for future submission to the ECIS ( Table 2).
Most of the indicators computed in this study have been used at international level for comparing and interpreting cancer data among different PBCRs (2)(3)(4)11). In addition, PBCRs are using them for data quality evaluation (12-21). UM% and PSU% were computed for all 130 PBCRs included in the analysis. UM% and PSU% indicators are based on topography and morphology variables, considered as core variables and available for all PBCRs.
A limitation of this first evaluation is the delay after the latest submissions, in 2018, to the previous ENCR-JRC data call and the present analysis. The benchmarks that were calculated and the experience with the previous data call will help reducing such delay in future data quality assessments in ECIS.
MV% and DCO% were computed on the basis of diagnosis variable, which is also considered a core variable and also available for all PBCRs. Nevertheless, the "death certificate only" category (i.e. basis of diagnosis = 0) of this variable is available only for PBCRs with access to death certificate.
Mortality data by cause of death, sex and age group were not available for 38 PBCRs and M:I ratio could therefore not be computed for these PBCRs.
Only 49 PBCRs submitted registration date for at least two years in each of the two considered periods (2000-2006 and 2007-2014). Therefore, timeliness, median of the difference between the registration date and the incidence date, was computed for the 49 PBCRs.
It will be not possible to compute timeliness at European level in the near future, because date of registration is among the variables not included in the 2022 Call for Data Protocol for European Population-Based Cancer Registries (22) due to the low number of the PBCR that submitted this variable in the 2015 ENCR-JRC data call. Nevertheless, this indicator could be useful at PBCR level for improving the efficiency of PBCR procedures (2).
The use of death certificates as information source is a mean for PBCRs of finding cases not captured by other registration procedures (23). A higher DCO% is often linked to poor cancer prognosis. A high percentage of DCOs can point out incompleteness, as well as low validity.
Liver and pancreas were the cancer sites with the highest proportion of DCO%. This observation is consistent with data from other PBCRs (12, 16, 18). In any case, the DCO% varies highly across cancer sites. The Finnish PBCR reported an overall DCO% (all sites) of 2.6%, also with high differences between cancers. The highest values were reported for unspecified topographies such as respiratory tract NOS (C37 and C39), other digestive organs (C26) and uterus NOS (C55, C58) with values 39%, 23% and 20% respectively. The DCO% for pancreas was 9.5% and for liver 4.8% (16).
A decrease in DCO% was observed between 1995-1999 and 2010-2014. This is in line with what reported for similar periods in Cancer Incidence in Five Continents volumes IX and X (24, 25) and as reported also in selected PBCRs' studies, namely Zurich and Zug PBCRs, where the proportion of DCO cases declined between 1997 (6.4%) and 2014 (0.8%) (18). As a matter of fact, a declining DCO% trend is a natural consequence of increasing attention and efforts over time to improve data quality. An important activity aimed at improving PBCRs data quality is carried out by the JRC and the ENCR, in the form of training opportunities, the set-up of working groups to draft guidelines provided for data coding, registry visits and most importantly validation of cancer registry data itself (26)(27)(28)(29)(30).
Although death certificates are available for the majority of the European PBCRs, there is still a consistent percentage (22%) with problems in accessing death certificates. This issue could have an impact on cancer incidence computation and also survival estimations (31). Nevertheless, DCO% is low for the majority of cancer sites and for the European PBCRs contributing to the ECIS. Therefore, it is unlikely to have significant impact in data comparability among PBCRs, in particular in the latest period of incidence. Lastly, it should be noted that the proportion of death certificate initiated cases (DCI%) is presently not available in ECIS. This indicator can be an important complement to evaluate DCO% but is still not routinely reported by many European PBCRs (3).
Opposite to the overall value, Iceland reported a low MV% (67%) for liver (13), and Finland reported a MV% of 63% for pancreatic cancer (16).
The highest MV% occurred in the youngest age group and declined with increasing age. This could be explained, at least partially, by a lower diagnostic activity in elderly patients.
An increase in MV% over time was observed, in line with what reported for similar periods in Cancer Incidence in Five Continents volumes IX and X (24, 25). MV% is mainly considered as a measure of validity, but a very high proportion of cases diagnosed by histology or cytology may also suggest that a PBCR is over-reliant on pathology as a source of information and might not detect part of the cases normally diagnosed by other means (2). As an example, the Swiss PBCRs of Zurich and Zug reported a MV% of 62% for 1997, which increased to 81% for 2014 (19).
The UM% was 11% in the observed periods, with a decrease from 13% in 1995-1999 to 9% in 2010-2014. This decrease was highest for non-Hodgkin lymphoma, liver and stomach, at least in part explained by the improvement of the diagnosis techniques for these tumours.
The PSU% was overall around 3%. This indicator decreased for the majority of PBCRs over time.
The PSU% reported by the Iceland PBCR was 1.9% for men and 3.1% for women, while it was 2.2% for both sexes in Norway in 2001-2005. In both countries there was an increase of this percentage with advancing age (12, 13). Differences in the age distribution of men and women populations could partially explain these differences. Nevertheless, differences by sex were not found in our study when all PBCRs were considered together.
Since rare tumours are defined by topography and specific morphology (32), UM% and PSU% could have an important impact in rare cancer incidence computation and data comparability.
The M:I ratio declined over time (from 0.57 in 1995-1999 to 0.46 in 2010-2014), confirming the findings from Cancer Incidence in Five Continents volumes IX and X (24, 25) and reported by selected PBCRs studies: in two Swiss PBCRs, the M:I ratio declined from 0.58 in 1980 to 0.37 in 2014 (18). Bulgaria reported an M:I ratio of 0.5 for males, and 0.4 for females (15). The higher M:I ratio for males observed also in our study (0.53 vs 0.49 for females) is also in line with the usual inverse relationship between this indicator and survival, which is higher in females (3,25).
M:I ratio can help interpreting cancer incidence in PBCRs, by comparing the indicator with cancer incidence rates. A higher M:I ratio could be associated with lower completeness and incidence rates, which should be interpreted with caution (see for instance the example in Supplementary Figure 12). Other factors, such as the quality of death certificates, should be also taken into account into the interpretation of M:I ratio.
As a limitation, mortality data was not available for 38 PBCRs at the moment of the analysis; these were mostly regional registries. In some cases, data was provided by PBCRs in a different format from the one required in the ECIS data call protocol (e.g. less than 18 age classes). Following the analysis most of the problems related to such data were solved, and updated mortality figures can be found in the ECIS web application (7).
Timeliness was evaluated computing the median time from incidence to case registration, which ranged between one and four years for the majority of PBCRs recording this information. This is in line with what reported in a survey performed in 2011, where European PBCRs stated a median time from incidence to data publication (which is related with data registration) of 18 months, with a range between 4 months and 5 years (11). Timeliness indicators have not been frequently reported by PBCRs; however the reduction in time to registration observed in our analysis (with an average decrease of 171 days between 2000-2004 and 2010-2014) has a similar trend to what reported by Norway (from over 525 days in 2001 to 261 days in 2005), whereas Iceland reported a median time from date of diagnosis to registration of 238 days (with a range between 49 and 1445 days) (12, 13). Lastly, an increase in time to registration was observed for 3 PBCRs between 2000-2004 and 2010-2014; this could possibly be due to resource constraints, which have been common for smaller regional PBCRs throughout Europe in recent years.
Indicators for European PBCRs such as MV%, DCO% and M:I ratio were found to be similar to those reported for other developed areas worldwide, in particular to North America, Australia and New Zealand (24).

Conclusion and way forward
The quality of incidence data reported by PBCRs has been improving across the study period. Data quality is worse for the oldest age groups and for cancer sites with poor survival. No differences were found between males and females. High variability in data quality could be detected across European PBCRs.
The harmonisation of PBCR' data as the input source for the assessment of cancer burden is one of the main aims of the support provided by the JRC to the ENCR to strengthen the basis for monitoring the cancer burden. In order to improve data quality and harmonisation, the JRC and the ENCR have been carrying out several activities along the years, namely the set-up of yearly training agendas and organisation of trainings, the coordination of thematic Working Groups to draft guidelines and recommendations on data coding, the development and provision of common rules and related validation software to check data compliance to agreed EU-wide standards.
In this context, the results reported in this paper are to be interpreted as the baseline for monitoring PBCRs data quality indicators in Europe along time.

Data availability statement
The data analyzed in this study is subject to the following licenses/restrictions: Requests to access the datasets should be directed to francescogiusti@hotmail.com.

Ethics statement
Ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.

Author contributions
The first draft of the manuscript was written by FG and CM. MB supervised data acquisition. All authors contributed to the article and approved the submitted version.