Incidence, mortality, and survival of hematological malignancies in Northern Italian patients: an update to 2020

Background Hematological malignancies (HMs) represent a heterogeneous group of diseases with diverse etiology, pathogenesis, and prognosis. HMs’ accurate registration by Cancer Registries (CRs) is hampered by the progressive de-hospitalization of patients and the transition to molecular rather than microscopic diagnosis. Material and methods A dedicated software capable of automatically identifying suspected HMs cases by combining several databases was adopted by Reggio Emilia Province CR (RE-CR). Besides pathological reports, hospital discharge archives, and mortality records, RE-CR retrieved information from general and biomolecular laboratories. Incidence, mortality, and 5-year relative survival (RS) reported according to age, sex, and 4 HMs’ main categories, were noted. Results Overall, 7,578 HM cases were diagnosed from 1996 to 2020 by RE-CR. HMs were more common in males and older patients, except for Hodgkin Lymphoma and Follicular Lymphoma (FL). Incidence showed a significant increase for FL (annual percent change (APC)=3.0), Myeloproliferative Neoplasms (MPN) in the first period (APC=6.0) followed by a significant decrease (APC=-7.4), and Myelodysplastic Syndromes (APC=16.4) only in the first period. Over the years, a significant increase was observed in 5-year RS for Hodgkin -, Marginal Zone -, Follicular - and Diffuse Large B-cell-Lymphomas, MPN, and Acute Myeloid Leukemia. The availability of dedicated software made it possible to recover 80% of cases automatically: the remaining 20% required direct consultation of medical records. Conclusions The study emphasizes that HM registration needs to collect information from multiple sources. The digitalization of CRs is necessary to increase their efficiency.


Introduction
Hematological malignancies (HMs) are a heterogeneous group of neoplasms, including distinctive entities with different etiology, pathogenesis, prognosis, and treatment (1). Various papers dealing with different aspects of HM epidemiology have been published (2)(3)(4)(5)(6)(7)(8)(9), with the most systematic reports on incidence, survival, and prevalence coming from the US Surveillance Epidemiology and End Results (SEER) (10), the Cancer Registries (CR) in UK and Nordic countries (11)(12)(13)(14). Registration of HMs is somewhat problematic (15) since the care of many of these diseases is not based anymore on hospitalization, particularly in the initial phases of the disease, and because their diagnosis is shifting increasingly to molecular tests instead of only microscopic assessment (16). Thus, CRs activity, traditionally based on the three characteristic data sources, i.e., pathology, hospital, and mortality records, may miss incident cases or register them several years after diagnosis. A resulting strong suggestion was to include new data sources among those commonly used to identify cases by CRs, such as molecular laboratory and drug prescription databases (17).
In Italy, the most updated national epidemiological data concerning HMs report incidence estimates for 2019 grouped into four major classes, namely Hodgkin Lymphoma (HL), non-Hodgkin Lymphoma (NHL), Plasma Cell Neoplasms (PCN), and Leukemia (13). The estimates are based on cases registered by CRs from 2010 to 2015. Although the regions of northern Italy had a higher incidence for almost all types of solid tumors, the national data concerning HMs were more homogenous, with the only exception being NHLs, which showed a higher incidence in Northern Italy. HM mortality was also comparatively homogenous, excluding NHLs that showed a higher rate in also in Northern Italy. Moreover, survival of the HMs four major groups mentioned above showed a marked improvement for patients diagnosed in 2005-2009 compared to patients diagnosed in 1990-1994 for both genders (18). Italian patients affected by HMs showed longer survival than the European average but shorter than what was reported in the US (14). Incidence trends were stable for HL, while they were decreasing for leukemia, myeloma, and NHL (for the latter, mainly in females) (19).
This study aimed to describe the incidence, mortality, and relative survival (RS) of patients diagnosed with HMs from 1996 to 2020 in a northern Italy province using a CR-dedicated software capable of automatically identifying suspected HMs cases by a robust digital health tool combining several databases.

Data sources
HMs incidence data from 1996 to 2020 were obtained from the Reggio Emilia CR (RE-CR) (approved by the provincial Ethics Committee of Reggio Emilia, Protocol no. 2014/0019740, on August the 4th, 2014). HM cases were defined according to the International Classification of Diseases for Oncology, Third Edition (ICD-O-3) (20) and included ICD-O-3 codes. They were classified into 14 main categories, as reported in Table 1. Diagnostic groups were defined according to the main cell of origin. Groups with a small number of cases were merged into larger groups considering similarities in the pathology and biology of the disease. For this reason, Nodular lymphocyte-predominant Hodgkin lymphoma was lumped together with classical HL, and Plasma cell Leukemias were lumped with Multiple Myeloma. The remaining rare subtypes were merged and classified as "others".
The primary information sources of the RE-CR were histopathological reports, hospital discharge records, and mortality data. The RE-CR covered a population of 531,891 inhabitants and was considered of high-quality data with 98% of histopathological confirmations (compared with the national median of 96%) and a low rate of Death Certificate Only (less than 0.1%, compared with the national 1%) (21,22). Furthermore, RE-CR was the only Italian CR with published data on all tumors and their most frequent sites updated to 2020 (23).

Digital health tools
Since HMs have different pathogenesis, behavior, and classification compared to solid tumors, we have developed new tools to register these specific neoplasms. New information flows have been added to the traditional information sources and integrated with laboratory tests, diagnostic reports, and information from general practitioners. New sources of information for HMs that included laboratory data reporting immunophenotypic, cytogenetic, and genomic aberrations are mandatory to assure the completeness and accuracy of case records.
In particular, the RE-CR algorithm identifies suspected cases by combining the three traditional databases (pathological reports, hospital discharge archives, and mortality records) with general and biomolecular laboratory tests, according to a deterministic list of diagnoses. All suspected cases are accompanied by a morphological code that comes from a pathological laboratory (SNOMED and ICD-O-3) or, when not present, from discharge records (ICD-9-CM) or the cause of death (ICD-10). The presence of any issue (e.g., previous solid or hematological tumors, unclear residential history, etc.) is highlighted so that the registrar can manually check for possible errors. In recent years, the RE-CR has developed an algorithm to further improve the quality of the cases by matching the pathological reports with a library of specific diagnostic terms thus improving the specificity of the SNOMED code indicated by the pathologist. For each suspected case, the registrar has access to all the current information in the Local Health Authority Data Warehouse (LHADW) organized in order of relevance, by date, and mention of oncological disease. In addition to the primary information sources, the LHADW contains imaging tests, outpatient visits, letters of discharge, and therapies provided by the six hospitals of the Reggio Emilia province.
Furthermore, every case proposed by the algorithm was validated manually: the registrars can confirm the proposed topography and morphology, change the date of diagnosis, or other data. Moreover, the RE-CR algorithm lets us provide detailed information on HMs, directly comparing RE-CR epidemiological data with those from the US SEER program (10).

Statistical analyses
The units of analysis are new HMs (the principal ICD-O-3 codes were: C09, C16-18, C34, C38, C41-42, C44, C48, C77, C71, C80) occurring from 1996 to 2020, second malignancies after nonhematologic malignancies were included; second malignancies after a registered HM were excluded. Descriptive analyses by age, sex, 14 HMs main categories (Table 1), and calendar period of a cancer diagnosis are presented. For age groups, specific incidence and mortality rates were calculated using the Province of Reggio Emilia (recorded on January 1st of each year) as denominators. The direct method was applied to standardize incidence rates, using the 2013 European Standard Population as a reference. Relative survival estimates net survival without other causes of death. It is defined as the ratio of observed in a cohort of cancer patients to the proportion of expected survivors in a comparable set of cancerfree individuals. For each HM group, relative survival was estimated with the Pohar Perme method; with this estimator, net survival for a cohort is estimated by weighting by the inverse of the individualspecific expected survival probabilities. The weights inflate the observed person-time and number of deaths to account for person-time and deaths not observed because of mortality due to competing causes (24). To estimate the general population mortality, we used calendar year-specific life tables for the Province of Reggio Emilia, provided by ISTAT (25). Changes in RS across the four 5-year periods (from 1996 to 2015) were tested for trends using Poisson regression models with the period as the independent continuous variable and the number of predicted events from Pohar's 5-year RS estimate for each period and the period-specific incident cases as exposed population. Analyses were performed using STATA 16.1 software. Trends over time were analyzed by calculating the annual percent change (APC) in agestandardized rates using Joinpoint Regression analysis (26).

Results
In the province of Reggio Emilia, the availability of dedicated software, introduced in 2012, made it possible to automatically record over 80% of the cases, while in the remaining 20%, it was necessary to retrieve the information by directly consulting the medical records. Overall, 7,578 HM incident cases were registered (4,116 in males and 3,462 in females), showing a higher incidence for males (male/female ratio = 1.2) with the only exception of Follicular Lymphoma (FL) more represented in women (male/ female ratio = 0.9) ( Table 2).
The probability of being diagnosed with an HM increased significantly with age (Table 3). In some groups of HMs, only a limited number of cases have been recorded in patients younger than 18 years. Conversely, in most cases, tumors affected patients >65 years, except for the HL and FL, which were more frequent in the age of 18-65 years ( Table 3). In patients <18 years, in addition to the 34 cases of HL, we recorded 93 cases of precursor B or T lymphoblastic lymphoma/leukemia (45% of the ALL subgroup). The overall median age at HM diagnosis was 70 years, being 40 and 21 years in HL and ALL, respectively. Table 4 shows the numbers and age-standardized incidence rates of HMs by quinquennium over the 25 years of the study. An increase in incidence is appreciable for FL, MPN, and MDS groups. However, it cannot be excluded that the low number of cases in the 1990s was caused by an under-registration due to the lack of mandatory registration of myelodysplastic syndromes by cancer registries in those years. An increasing incidence trend was generally observed for all groups except for Chronic Lymphocytic Leukemia/Small Lymphocytic Lymphoma (CLL/SLL), which showed a decreasing trend in the last period. Annual age-standardized incidence and mortality rates were used to describe HMs trends from 1996 to 2020 (Figure 1). A significant increase in the incidence rate was registered for FL (Annual Percent Change, APC: 3.0%; 95%CI 1.4, 4.6). On the contrary, MPN showed an initial increase until 2013 (APC: 6.0%; The mortality trend showed a significant increase for HL until 2008 (APC: 13.1%; 95%CI 3.8, 23.1), followed by a substantial decrease in the last period (APC: -9.8%; 95%CI -15.7, -3.5) (Figure 1) Table 5 compares incidence, mortality, and survival recorded in RE-CR with those recorded in the US Surveillance, Epidemiology, and End Results (SEER) program. In general, when comparable, the incidence rates recorded in our study are slightly higher than those in the US, except for CLL/SLL. The mortality data align with those from the US registry, whereas survival shows higher values except for CLL/SLL and PCN.

Discussion
HMs represent a heterogeneous group of neoplasms with many challenges in the diagnostic pathway. Our work aimed to describe HMs' behaviors over 25 years of registration by RE-CR from 1996 to 2020 through new dedicated software that automatically identified suspected HMs cases by combining several databases.   It is an accepted notion that the consultation of the three historical archives (anatomy, hospital discharge records, and mortality) is sufficient for solid tumors CRs, in which two variables (topography and morphology) are mandatory (27). Concerning HMs, we have previously shown that including additional information sources improves the completeness of cases by 4.2% in these neoplasms (17). The use of our algorithm allowed us to enhance the collection, coding, and classification quality of HM registration. Artificial Intelligence (AI) applications should facilitate this approach in terms of timeliness and sensitivity (28)(29)(30). Learning algorithms are, for example, applicable to produce stochastic estimators (e.g. positive predictive value for Reggio Emilia Cancer Registry. Incidence and mortality age-Standardized Rates (European Standard Population 2013) per 100,000 person-years (p-y) in the Province of Reggio Emilia in the period 1996-2020 by HMs categories.
the HM) and could increase the automatism of our current process. Attempts have been made in our Institute to use AI for diagnostics (31)(32)(33)(34), the treatment of some neoplasms (35)(36)(37), or screening programs (38)(39)(40). Even if the present study is limited to only one Italian CR, it may contribute to a proper CR classification for hematologists in their clinical practice.
Our study reports 7,578 cases of HMs; they affected mainly males (except for Follicular Lymphoma) and elderly patients with a median age of 70 years except for HL (median age of 40 years), and ALL (median age of 21 years).
Concerning single malignancies, HL incidence rates registered in our study (3.9 and 4.5 cases x 100,000, respectively, in males and females) are very similar to those reported in Northern Italy (4.1 and 3.4 cases per 100.000 in males and females, respectively) (18) and close to data from other western countries (41). Five-year RS has increased compared to the early study periods (92.4% in 2011-2015 vs. 81.3% in 1996-2000), with similar findings to a previous Italian study (42), confirming the high curability of HL. Our data did not confirm the higher mortality rates reported in a recent population-based study (Excess Mortality Ratio = 1.26; 95% CI 1.04-1.54) (11).
For NHL, the US and Italian data report a stable incidence trend in recent years with a significant drop in mortality and increase in survival, which is mainly observed for B cell NHL because of the introduction of immunotherapy in the late '90s (3,(43)(44)(45)(46)(47)(48). Our study described incidence, mortality, and survival for each primary NHL subtype. With some limitations due to the small number of rarer histotypes, we showed a significant drop in the incidence rates of CLL/ SLL and MZL and an increase in FL. All other NHL subtypes had stable rates. Mortality rates had similar trends compared to incidence, despite survival increased during the study period for B-cell Lymphomas, achieving statistical significance for DLBCL patients. Similar to our results a Swedish study reported a 47% increase in the 5-year prevalence of NHL overall in 2016 compared to 2004: these results shall be used to better evaluate the burden of disease and to improve healthcare planning and resource allocation (12). The observation of a notable improvement in DBCL survival is an expected finding that is mainly due to the introduction of antiCD20 monoclonal antibodies in addition to the polychemotherapy treatment since the early 2000s (49) and might also have been the consequence of healthcare improvements, including increasing access to effective treatments for elderly patients (50).Of note, consistently due to their indolent nature and to the availability of treatments, the 5-year RS was close to 100% in both MZL and FL. Conversely, peripheral T-cell lymphoma (PTCL) was confirmed as the lymphoma subtype with relatively worse outcome reflecting the lack of effective therapies. Reggio Emilia Cancer Registry. 5-yr Relative Survival, test for trends using Poisson models, by HMs categories and calendar period.
The incidence of PCN reported in our study is in line with that reported in northern Italy, with 11.2 and 8.1 cases per 100,000 per year, in males and females, respectively (18). Interestingly, the incidence decreased in both sexes while mortality was stable. The 5-year RS in Italy is 53%, slightly better than the 49% recorded in our analysis. Mortality also decreased (2); in particular, the introduction of proteasome inhibitors in 2003 and immunomodulators in 2006 led to a net increase in survival. For example, the increase of 5-year RS from 37% to 48% comparing cases diagnosed from 1994-2000, with those diagnosed between 2001-2006, was similar to the increase observed at the Mayo Clinic in the same periods (from about 35% to 45%) (7).
Concerning myeloid malignancies, they are generally more frequent in Western countries, with predominance in the male sex (6,51), probably related to different exposures to risk factors and occupational exposures (52) and aging (13). However, since the group includes a considerable variety of subtypes, incidence and survival can present different values even within the same category: for example, the 5-year survival of AML varies from 5% for aggressive entities like therapy-related AML to 85% for indolent/ treatable conditions like chronic myeloid leukemia; again males usually experience worse survivals than females (48.8% vs. 60.4%, respectively) (13).
The completeness and the accuracy of the clinical and pathological information is a strength of our study. Also, ours is the only Cancer Registry in Italy that explored what number of additional cases could be detected through accessing information sources other than those routinely feeding Cancer Registries. Our results suggest that such procedure would be important, as wellinformed population studies are required to inform aetiological hypothesis, healthcare planning, and to assess the impact of new therapies (17).. Furthermore, it is a population-based study, so there is not only information from hospital centres or patients enrolled in clinical trials, and today more than ever we need population-based longitudinal data to inform aetiological hypothesis, healthcare planning, and impact assessment of the introduction of new therapies (13,14).
The lack of individual information, on treatments that substantially impacted survival in recent years and comorbidities, which can influence the choice of treatments and, therefore, the outcome limits the interpretation of our results. Moreover, some of the categories that were clinically based and may not fully correspond to conventional classifications, Finally, the small numbers observed in some predefined groups made it impossible to analyze them separately and we had to merge them with pathologically close groups. These choices might affect the precision of some of our observations but wouldn't affect the study's main message.
In conclusion, we provided updated about HM incidence and outcomes based on the Cancer Registry of Reggio Emilia province, a highly productive and industrialized territory representative of Northern Italy. More importantly, our study could provide the basis for future programs on HM control, patient care, and cancer research programs. This goal can be reached by offering, for the future, well-structured databases including biomolecular information as suggested by the WHO classification to allow also the crossvalidation of new prognostic indicators for HM.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the corresponding author, without undue reservation.

Ethics statement
This population-based cohort study uses data from the Reggio Emilia Cancer Registry, approved by the Provincial Ethics Committee of Reggio Emilia (ref. no. 2014/0019740 of 4 August 2014). The Ethics Committee authorized, even in the absence of consent, the processing of personal data, including those suitable for revealing the state of health of patients who are deceased or untraceable for the execution of the study.