Electronic Support for Retrospective Analysis in the Field of Radiation Oncology: Proof of Principle Using an Example of Fractionated Stereotactic Radiotherapy of 251 Meningioma Patients

Introduction The purpose of this study is to verify the possible benefit of a clinical data warehouse (DWH) for retrospective analysis in the field of radiation oncology. Material and methods We manually and electronically (using DWH) evaluated demographic, radiotherapy, and outcome data from 251 meningioma patients, who were irradiated from January 2002 to January 2015 at the Department of Radiation Oncology of the Erlangen University Hospital. Furthermore, we linked the Oncology Information System (OIS) MOSAIQ® to the DWH in order to gain access to irradiation data. We compared the manual and electronic data retrieval method in terms of congruence of data, corresponding time, and personal requirements (physician, physicist, scientific associate). Results The electronically supported data retrieval (DWH) showed an average of 93.9% correct data and significantly (p = 0.009) better result compared to manual data retrieval (91.2%). Utilizing a DWH enables the user to replace large amounts of manual activities (668 h), offers the ability to significantly reduce data collection time and labor demand (35 h), while simultaneously improving data quality. In our case, work time for manually data retrieval was 637 h for the scientific assistant, 26 h for the medical physicist, and 5 h for the physician (total 668 h). Conclusion Our study shows that a DWH is particularly useful for retrospective analysis in the radiation oncology field. Routine clinical data for a large patient group can be provided ready for analysis to the scientist and data collection time can be significantly reduced. Furthermore, linking multiple data sources in a DWH offers the ability to improve data quality for retrospective analysis, and future research can be simplified.

inTrODUcTiOn Routinely documented clinical data are of great importance for patient care as well as for research purposes (1,2). So far, the retrospective analyses in medical research have been predominantly performed manually, meaning that clinical data are often transferred by hand from routine clinical reports into a separate research database (3) and stored in standard office tools (e.g., Microsoft Excel spreadsheets), which are not validated for clinical research. The continuously increasing expansion of electronic documentation in the clinical treatment process creates a large amount of various databases (4); thus, manual retrospective analysis is currently quite ambitious and time consuming.
In the field of radiation oncology, data sets are large and heterogeneous (5). Electronic information systems contain patients' data for imaging in the Radiology Information System and Picture Archiving and Communication System, for irradiation in the Clinical Information System (CIS), e.g., Oncology Information System (OIS, MOSAIQ ® ) and data of the current course of the patients' disease in the electronic health record (EHR, e.g., Soarian ® Clinicals).
With the increasing amount of patient information captured in EHRs and CISs, more opportunities should be established to facilitate clinical research by obtaining routine clinical data from distributed databases for secondary use, though providing access to routine clinical data for secondary use is challenging in practice (6). One of the greatest challenges in clinical research is to define and implement health data standards for integration between routinely used subsystems (7,8). Medical data are frequently distributed across multiple electronical information systems of several departments in different forms of documentation styles (9). Although most university hospitals already implemented commercial hospital information systems and started to develop comprehensive EHRs, there is still a gap between clinical care and using this data for medical research that needs to be filled (10,11). Recent studies have focused on providing routine clinical data for research purposes, e.g., by using a single-source tumor documentation or supporting systems for patient recruitment into clinical trials in the field of radiation oncology (12) and intensive care (13).
Data warehouses (DWHs) are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise (14). The purpose of this study is to verify the possible benefit of a DWH for retrospective analysis and reflect differences in manual and automated data retrieval.
Using meningioma patients as an example, we performed a therapy evaluation by utilizing an integrated electronic research database system DWH (clinical DWH) of the Erlangen University Hospital (UKER) to make routine radiotherapy data available from various operational subsystems. This is one of the largest populations of meningioma patients treated with stereotactic radiotherapy (SRT) in a single institution with a comprehensive database due to a high overall survival rate and a long observation period of meningioma patients after SRT. 1 We manually and electronically collected basic information (patient characteristics), radiotherapy, and outcome data of 251 meningioma patients, who were irradiated from January 2002 to January 2015 at the Department of Radiation Oncology of the UKER (see text footnote 1). Currently, manual data collection represents the "gold standard." In our study, we compared the results of both the electronic and manual data retrieval process and determined the congruence of data. Moreover, we measured the corresponding time requirements for both retrieval methods and the involvement of personnel (physician, physicist, scientific associate).

MaTerials anD MeThODs environment
Erlangen University Hospital (UKER) is a tertiary care hospital that has 1.368 beds and combines 24 departments, 18 independent divisions, 7 institutes, and 25 interdisciplinary centers. In 2015, over 60,000 inpatient and nearly 475,000 outpatient cases were treated (15). At the Department of Radiation Oncology, 130-150 patients with many different tumor entities are irradiated daily. Approximately, 32 patients with meningioma are irradiated annually.
For our study, an agreement for the usage of routine clinical data was signed by those departments of the UKER that were involved in the patients' treatment (Neurosurgery, Neurology, Neuropathology, and Radiology). These regulatory requirements and institutional policies need to be reconciled to use clinical routine data for clinical research activities.

Principles of radiotherapy of intracranial Meningioma
During the past two decades, SRT has become increasingly well known as a treatment option for meningiomas (16,17). Adjuvant SRT is offered to all grades II and III meningioma patients, whereas symptomatic grade I meningioma patients only received SRT after incomplete resection. Inoperable grade I (symptomatic only), grades II and III meningioma are treated with primary SRT. SRT was performed using the stereotactic radiosurgery system Novalis™ (BrainLAB, Feldkirchen, Germany). Patients were treated on consecutive workdays, with one fraction per day (see text footnote 1). SRT was mostly given in 28, 30, or 25 fractions to a median reference dose of 54.0 Gy. scientific Objective of the retrospective analysis  with SRT at the department of Radiation Oncology of the UKER, we have illustrated the workflow of manual and electronical supported data retrieval for this analysis. For determination of efficacy of SRT on long-term outcome (e.g., overall survival, local control), the relevant parameters (age, gender, tumor localization, WHO grading, and current disorders after radiotherapy), data of the computed tomography (CT) or magnetic resonance (MR) imaging (to determine the tumor status after therapy), and temporal dose distribution [fractionation, target volume (PTV), dose distribution of risk organs] were evaluated.
Workflow of Manual and electronical supported Data collection for retrospective analysis

Manual Data Retrieval
For the purpose of retrospective analysis, the Department of Radiation Oncology begins with specifying the research question and defining the patient collective. Here, the patient collective was identified by multiple reference sources (e.g., outdated medical records and databases, institutional statistics) and manually summarized in a separate chart (Microsoft Excel 2010). All medical data in the routine CISs and necessary data elements for each patient were manually and separately noted in an electronic document using Microsoft Excel 2010. The systems used for manual analysis are listed in Table 1.
To evaluate the time required for manual data retrieval, we documented the time needed to collect all necessary data elements from clinical source systems and manually transcribed them into an Excel spreadsheet.

electronically supported Data retrieval
In order to simplify retrospective analysis, we decided to use a tool that obtains routine clinical data from multiple CISs for secondary use. Since 2003, the UKER provides the clinical DWH research platform to scientists for numerous analyses. It has the ability to combine data from multiple clinical source systems and to provide it to the hospital users. The DWH stores clinical and administrative data from 22 different data sources (e.g. Accounting, Pharmacy, Surgery, Anesthesia, Pathology, and Radiology). For transformation of routine clinical data, it utilizes the open enterprise-class platform Cognos Data Manager. The database language Structured Query Language (SQL) is used for defining data structures, editing, and querying the databases.
We used the DWH for defining a patient collective and obtaining routine clinical data from multiple CISs. The workflow of manual and electronical supported data retrieval for retrospective analysis is illustrated in Figure 1.
A database query based on routine clinical data from patient care was initiated to design a core data set for retrospective analysis (date of the last contact, date of the last imaging, life-status, beginning and end of the radiotherapy, fractionation, and dose). Selected data elements and the related data source system are shown in Table 2.
The official system which was used for coding of the diagnosis is the 10th Revision of German Modification of the International Statistical Classification of Diseases (ICD-10) and for procedures the German "Operationen-und Prozedurenschlüssel Version 2015. " Currently, not all listed data elements or source systems are accessible for the DWH (e.g., tumor as cause of death in the GDTS, the minimum or maximum dose, PTV-volume, coverage PTV, dose distribution on risk organs documented in the treatment planning software) or there were no suitable methods available for the extraction of the data elements (e.g., tumor localization, WHO grading, or several radiotherapy documented in Soarian ® Clinicals) at the time of analysis ( Table 2). Therefore, they are not included in the electronical analysis.
integrating Ois MOsaiQ ® into the clinical DWh of the UKer: reusing Data from the Ois MOsaiQ ® for retrospective analysis Since 2012, the Department of Radiation Oncology uses the OIS MOSAIQ ® developed by Elekta (Hamburg, Deutschland). It provides medical oncology data (e.g., demographic data,  diagnoses, beginning and end of the radiotherapy, planned and administered fractionation and doses), regulates the respective linear accelerator, and is linked to imaging, planning, and therapy systems.
In order to make irradiation data available for retrospective analysis, we analyzed the table structure from the clinical system and transferred a copy of relevant data tables as read-only user during the non-productive clinical stage of radiotherapy (after 5 p.m.) into the staging area of the DWH. This process is called "extraction. " As a next step, we queried the DWH to select patients with a diagnosis of meningioma (ICD10-GM code D32.0, D32.9, C70.0, C70.9) and to identify the data elements beginning and end of the radiotherapy, planned and administered fractionation and dose distribution. Subsequently, we compared the results of the data base query and the manual data retrieval.
In addition, unnecessary or inconsistent data can be corrected or extinguished at the staging area. This process is called "transformation. " The entire process is called ETL (extraction, transformation, loading) (18). The structure of the DWH and technical implementation of the clinical source system MOSAIQ ® is illustrated in Figure 2.

statistical analysis and ethics committee Vote
Standard summary statistics and two-tailed 95% confidence intervals were calculated as appropriate. All statistical analyses were performed using the Statistical Package for the Social Sciences version 21 (IBM Corp., Armonk, NY, USA). The level of significance for all analyses was set at α = 0.05 (two-tailed).
Our institution obtained a positive ethics committee vote from the ethical review board for our research (reference number 347_16 Bc). All data used for the retrospective analysis was in anonymized form.

resUlTs effectiveness of Patient Data collection-DWh
A total amount of 275 data sets (case ID) from 251 (patients ID) patients were manually collected and stored in a Microsoft Excel spreadsheet. We counted 275 data sets (case ID) due to the fact that some patients had more than one lesion and thus were irradiated at multiple times.
Two hundred seventy-four electronic data sets (100%) from 250 patients were electronically collected because one patients' data were not available for data protection reasons. The data congruence of the data elements "beginning and end of the radiotherapy, date of the last contact, date of the last imaging and life-status (alive, dead), " were evaluated on the basis of manual data retrieval compared with the results of the DWH report.

Manual Data retrieval compared with the results of the DWh report
The summary of selected data elements determined by manual and electronical supported data retrieval is shown in Table 3. identical. Thirty-nine (22 manual, 17 electronical) data elements were not identical. Deviating results are more often generated by the manual than the electronical data retrieval method. Manual data retrieval produced 22/274 (8%) deviating results: this difference was caused by the fact that in 22 cases the treatment date of radiation was incorrectly documented in the discharge letter and the incorrect dates were transferred into the Microsoft Excel spreadsheet.
The DWH determined the correct treatment date for these 22 patients. However, the DWH query produced 17/274 (6.2%) deviating results due to an error in the data base query. The query was carried out patient-based (patients ID) instead of case-based (case ID). If a patient (patients ID) was treated multiple times over several years (case ID) only the latest "date of beginning and the end" was identified. For a flawless determination of the treatment (case ID), date the SQL statement of the data base query has to be adjusted for future data exports.

Data element "Date of the last imaging"
Of the 274, 248 (90.5%) by manual and 236 (86.1%) by electronical retrieval data elements were identical.
Differing results are more often generated by the electronical (38/274) than the manual (26/274) data retrieval method. Manual data retrieval produced 9.5% of inconsistent data: this difference FigUre 2 | integrating Oncology information system (Ois) MOsaiQ ® into the clinical data warehouse (DWh) of the UKer for secondary use: we transferred a copy of relevant data tables as read-only user during the non-productive clinical stage of radiotherapy (after 5 p.m.) into the staging area of the DWh (extraction). As a next step, we queried the DWH to identify relevant data elements (beginning and end of the radiotherapy, fractionation and dose).  accessible by a database query as it is based on the documented procedure code in the source system of the UKER.

Data element "Date of the last contact"
All data elements collected electronical were identical. Deviating results are only caused by the manual (42/274, 15.3%) data retrieval method. There were two reasons for this: first, for 18 patients the date of the last contact was incorrectly transferred from the source system into the Excel spreadsheet. Second, during the time of analysis, 24 patients were being treated again in another department at the UKER, and subsequently, manually collected data were already outdated.
Data element "life-status"

effectiveness of Patient Data collection-Ois
Fractionated SRT is documented in the OIS MOSAIQ ® since June 2012. We identified 110 suitable values for 74 patients (74 stereotactic irradiation + 36 data values for boost irradiation) since the system went into operation at the department of Radiation Oncology and transferred them into the DWH. We collected the data elements "beginning and end of radiotherapy, distributed dose and fractionation" by querying the DWH and compared the results with the manual data retrieval.
Manual Data retrieval compared with the results of the Mosaiq ® report Data Element "Beginning of the Radiotherapy" and "End of Radiotherapy" Differing results were only caused by the manual data collection method (22/110): due to an incorrect date in the medical discharge letter manually retrieved data produced the deviating data for the beginning of radiotherapy and for the end of radiotherapy. There were no deviating results by querying the source system MOSAIQ ® (DWH report) because the linear accelerator is regulated by the OIS that uses validation rules for data entry for every single fractionation in the primary source system.

Data element "administered Dose and Fractionation"
In all, 94.6% (70/74) data elements were identical. The manual data retrieval methods lead to 4 (5.4%) deviating results because a medical physicist determined 4 false data elements of administered dose and fractionation on the basis of the paper-based health record, OIS MOSAIQ ® and the treatment planning systems I-plan RT ® or Pinnacle 3® . There were no deviating results by querying the source system MOSAIQ ® (DWH report).

Time invested in Manual Data retrieval
To evaluate the time required for manual data retrieval, we documented the time needed to collect all necessary data elements from clinical source systems and manually transmit them into a Microsoft Excel spreadsheet. The manual data retrieval required 668 h (Figure 3). The collection of all data elements took place over an extended period of time of about 24 weeks.
The scientific assistant required the largest amount of time while manually collecting routine clinical data in 637 h (95.4%) The support of a physician (5 h, 0.7%) and a medical physicist (26 h, 3.9%) was required (Figure 3). The physician analyzed actual MR or CT imaging (to determine localization, relapse, and progression of the tumor) and the medical physicist evaluated necessary data elements (PTV volume, fractionation, doses, minimum/maximum dose, coverage PTV, dose distribution of risk organs) on the basis of the paper-based health record and the treatment planning systems I-plan RT ® or Pinnacle 3® .

Time consumption for electronical Data retrieval
In collaboration with a computer scientist of the Department of Medical Informatics and two scientific assistants of the Department of Radiation Oncology of the UKER, the DWH report was developed. Implementing the DWH query took 30 h that are composed of the definition, adjustment, and execution of the database query. For administrative activities (e.g., obtaining permission for data access by those departments of the UKER, which were involved in the patients' treatment), we need additional 5 h.
The support of a medical physicist was not required to evaluate data elements (beginning and end of radiotherapy, administered fractionation, and dose) on the basis of the paper-based health record and the treatment planning systems I-plan RT ® or Pinnacle 3® . For evaluating the data elements (PTV volume, minimum/maximum dose, coverage PTV, dose distribution of risk organs), the support of the medical physicist (approximately 20 h) and a physician (5 h) to analyze actual MR or CT imaging is still required.

DiscUssiOn
The purpose of this study is to verify possible benefits of a clinical DWH for retrospective analysis in the field of radiation oncology.
We compared two different methods of collecting routine clinical data: manually and electronically using DHW for secondary use of the scientific retrospective analysis.
In summary, our results indicated that the electronically supported data retrieval (DWH) showed an average of 93.9% correct data and a significantly better (p = 0.009) result compared to manual data retrieval (91.2%). Using a research, database (DWH) replaces manual activities and offers the ability to significantly reduce data collection time and labor while improving data quality. However, data integrity depends on the quality of a structured routine clinical documentation as well as the system requirements to get access to medical data in the clinical source systems. Furthermore, expert knowledge for the transformation of routine clinical data is necessary in practice.
In our study, manual data retrieval needed significantly more overall workload time (668 h) of all involved professional groups compared to implementing the DWH query (30 h). We needed the support of a physician (5 h) to manually analyze CT or MR imaging and a medical physicist (26 h) for evaluating necessary irradiation data elements (fractionation, dose distribution, coverage/PTV volume, minimum/maximum dose, dose distribution at risk organs). Up to now, the support of a physician (5 h) to analyze actual MR or CT imaging is still required. In order to completely automate the assignment of the medical physicist for retrospective analysis (evaluating the data elements coverage/PTV-volume, minimum/maximum dose, dose distribution at risk organs), the departmental planningsystems I-plan ® RT and Pinnacle 3® need to be made accessible for the DWH.
In addition, the long period of time necessary for retrieving data manually produced outdated databases and caused errors when transmitting data into an electronic format such as Microsoft Excel, which became evident in some cases of our study. Furthermore, data retrieval errors can easily be introduced because medical record data are not guaranteed to be accurate (e.g., incorrectly documented treatment date of radiation in the discharge letter of radiotherapy) and depend on the care and knowledge level of the scientific assistant. A related study by Roelofs et al. (19) that examined the benefit of a clinical DWH combined with tools for extraction of relevant parameters data for a radiotherapy trial supports this point of view. A DWH is beneficial for data collection time in addition to offering the ability to improve data quality.
Besides of benefits of data collection times and improving data quality, the strength of a DWH its ability to combine data from multiple clinical source systems and make it easily accessible for researchers. Though, before using routine data for research purposes, it is important to carefully verify this data and determine data integrity. In this context, Galster (20) has reviewed existing barriers for reusing routine data, he came to the conclusion that clinical data are not available when or where it is needed, even though data is present, the usage of the existing source is prohibited or cannot be routinely used in its available form. In our study, there are regulatory requirements and institutional agreements that need to be reconciled from the departments of the UKER that are involved in the patients' treatment in order to use clinical routine data for clinical research activities.
Next to the challenges of gaining access to multiple data sources, another major barrier for data reuse is the fact that routine data cannot be used in its available form. Usually, clinical data are distributed across several tables in a generic form with coded values (21). In our analysis, some data (e.g., tumor localization, histology/pathology) are semi-structured values (mostly free-text format) and therefore can't be used for automatically analysis. The data recorded in structured fields are more readily to be extracted from an EHR than data that was recorded in free text notes. Therefore, expert knowledge for the transformation of this data is necessary, and the accuracy of database queries mainly depends on a specific SQL statement. In addition, EHR data are frequently recorded inconsistently in a variety of formats that are complex, inaccurate, and often incomplete (22). For our study, it is a necessary condition that medical data are recorded completely in a specific data schema in order to automatically capture as much information as possible for retrospective analysis.
Furthermore, EHRs often do not tell a complete patient story, whether it may be those of a single institution or those aggregated across institutions (23). An example for this problem in our study is the date of death that is only documented in the clinical source system (EHR) for patients who died at the UKER. Moreover, the information about an external imaging is not routinely documented in a coded form in the EHR and is therefore not accessible for database queries. Consequently, medical details from external sources (e.g., life status in the GTDS ® , imaging at an external hospital) must be requested or made available for automated data abstraction. This would be worthwhile in order to determine a patients' life status as an electronical life-status comparison with the residents' registration offices is prohibited due to privacy policy since 2008 and an amendment to the Bavarian Cancer Registry is made for provision in 2016 (24). To keep the medical routine data up to date, we send a specially designed questionnaire to the patients in order to assess the health-related outcome that are completed by patients themselves.
Additionally, routine clinical documentation in the primary source systems affects the research outcome: data quality for retrospective analysis is only as good as the routine clinical documentation in the primary source systems e.g., EHR. Therefore, Kessel et al. (5) have developed a professional data-based documentation system for analysis purposes where information about radiation therapy, diagnostic images, and dose distributions has been imported into a web-based system. They showed that the central storage of data outside of EHR leads to benefits of digital management, data analysis, and reusability of the results. In this context, Kirrmann et al. (9)  developed and described a flexible browser based reporting and visualization system for clinical and scientific use by linking web-services/MOSAIQ ® , the physician letter system MEDATEC, and central server MiraPlus (laboratory, pathology and radiology). They reported that all relevant data were available at all times in a simple manner, which improved their effectiveness resulting in a considerable amount of time saving.
In this context, one benefit of our retrospective analysis was that the gain of access to radiotherapy data from the clinical source system MOSAIQ ® . Besides the data sets "beginning and the end of radiotherapy" for evaluating treatment outcomes of patients with meningioma, we also extracted irradiation parameters "planned and effectively implemented fractionation and dosage distribution" from the existing primary source (OIS). Due to the fact that the linear accelerator and the OIS both use validation rules for data entry in the primary source system, original routine data are not subsequently changed. As we have shown in our analysis, using original and unprepared data leads to a higher percentage of accurate data.
A summary of described limitations and potential solutions using a DWH are shown in Table 4.
Although only a selected data set of the evaluation of patients with meningioma was examined and not all data were directly available in a DWH, our present study highlights the benefit of electronical supported data retrieval for secondary use. Thus, our goal is to adapt our approach to other types of tumors in radiation oncology and extract more parameters from the existing routine care documentation systems. cOnclUsiOn Our present study shows that a DWH is particularly beneficial for retrospective analysis in the field of radiation oncology. Routine clinical data for a large patient group can be provided ready for analysis to the scientific operator, and data collection time can be reduced significantly. Furthermore, using a DWH provides the ability to improve data quality for retrospective analysis; thus, future research can be simplified. However, expert knowledge for the transformation of routine clinical data is still necessary and the quality of a structured routine clinical documentation in the CISs as well as the system requirements allowing access to medical data also affect the outcome.

eThics sTaTeMenT
This study was carried out in accordance with the recommendations of the ethical review board of the Friedrich-Alexander-University of Erlangen-Nuremberg (FAU) with written informed consent from all subjects. All subjects gave written informed consent in accordance with the Declaration of Helsinki. The protocol was approved by the ethical review board of the Friedrich-Alexander-University of Erlangen-Nuremberg (FAU) (reference number 347_16 Bc). aUThOr cOnTriBUTiOns SR: conducted data analysis, described throughout the manuscript, and major contributor to the writing of the manuscript and literature search. RF: clinical oncologist and principal of the research organization, involved in the design of the study, and reviewed the manuscript. TG: made substantial contributions to the acquisition of data, developed the data warehouse report, and reviewed the manuscript. H-UP: was involved in the design of the study and reviewed the manuscript. DL: major contributor to the writing of the manuscript, supervised the study, and major contributor to organization of the data analysis and manuscript. All authors read and approved the manuscript. reFerences