Unsatisfied Reporting Quality of Clinical Trials Evaluating Immune Checkpoint Inhibitor Therapy in Cancer

Background More and more immune-oncology trials have been conducted for treating various cancers, yet it is unclear what the reporting quality of immune-oncology trials is,and characteristics associated with higher reporting quality. Objective This study aims to evaluate the reporting quality of immune-oncology trials. Methods The PubMed and Cochrane library were searched to identify all English publications of clinical trials assessing immunotherapy for cancer. Reporting quality of immune-oncology trials was evaluated by a quality score with 11 points derived from the Trial Reporting in Immuno-Oncology (TRIO) statement, which contained two parts: an efficacy score of 6 points and toxicity score of 5 point. Linear regression was used to identify characteristics associated with higher scores. Results Of the 10,169 studies screened, 298 immune-oncology trial reports were enrolled. The mean quality score, efficacy score, and toxicity score were 6.46, 3.61, and 2.85, respectively. The most common well-reported items were response evaluation criteria (96.0%) and toxicity grade (98.7%), followed by Kaplan-Meier survival analyses (80.5%). Treatment details beyond progression (12.8%) and toxicity onset time and duration (7.7%) were poorly reported. Multivariate regression revealed that higher impact factor (IF) (IF >20 vs. IF <5, p < 0.001), specific tumor type (p = 0.018 for lung, p = 0.021 for urinary system, vs. pan cancer), and a certain kind of immune checkpoint blocking agent (p < 0.001 for anti-PD-1 or multiagents, vs. anti-CTLA-4) were independent predictors of higher-quality score. Similar independent predictive characteristics were revealed for high-efficacy score. Only IF >20 had a significant high-toxicity score (p < 0.001). Conclusion Immune-oncology trial reports presented an unsatisfied quality score, especially in the reporting of treatment details beyond progression and toxicity onset time and duration. High IF journals have better reporting quality. Future improvement of trial reporting was warranted to the benefit-risk assessment of immunotherapy.

Clinical trials are considered essential to advancing and evaluating the use of ICB in cancer treatment (5). Biomedical publications of various journals are key methods for disseminating the design, conduct, results, and conclusions of these trials. The published reports should provide the reader with the ability to fully understand the trial and make informed judgments of trial results. Thus, it needs to be a unified standard to ensure the quality of the reports.
The Consolidated Standards of Reporting Trials (CONSORT) statement provides guidance to authors regarding essential items that should be included in trial reports and can be also applied to immune-oncology (IO) trials (6,7). However, distinct mechanisms of IO therapies exhibit unique efficacy and toxicity compared with traditional cancer treatments such as chemotherapy, which may lead to additional considerations for reporting guidelines of IO clinical trials (8,9). Based on this fact, the Trial Reporting in Immuno-Oncology (TRIO) statement is developed by American Society of Clinical Oncology (ASCO) and the Society for Immunotherapy of Cancer (SITC) to improve the interpretation and comparison across IO trials (10,11). There have been literatures evaluating the quality of randomized clinical trials (RCT) reports based on the CONSORT statement (12)(13)(14), but no studies specifically evaluate the report quality of IO trials. Therefore, the purpose of this study is to evaluate the reporting quality of IO trials based on the TRIO statement. In addition, we also investigated the publications' characteristics associated with higher quality in IO trial reporting.

Quantitative Scoring System for Quality of Trial Reporting in Immuno-Oncology
A trial reporting of immuno-oncology quality score (TRIOQS) based on TRIO statement was defined by two of the authors (CC and SH). The score was based on the recommendations of TRIO statement except the combination or sequencing of immunotherapies reporting standard ( Table 1). This scoring system contained two parts, the one was efficacy score (TRIOQS-E, items 1-6 in Table 1) and the other one was toxicity score (TRIOQS-T, items 7-11 in Table 1). Each item enrolled in TRIOQS was scored as 1 if it was reported or 0 if it was not reported at all; each item was weighted with equal importance. For those recommendations with several subcomponents, a score of 1 was given if any one of them was reported. The 12th recommendation of TRIO statement was excluded because it was especially for clinical trials with combination or sequencing of immunotherapies.
The scoring system was piloted on 10 randomly selected publications (110 items) by two authors (CC and YZ) who were blinded to each other's evaluation results. Among 110 items, 6 discrepancies were identified, and all were successfully resolved by consensus. Based on this consensus, the two authors (CC and YZ) evaluated the remaining publications. Include spider plots or swimmer plots in efficacy descriptions to better report kinetics of response.
Spider plots or swimmer plots were presented to report response in the main text or appendix. If the prespecified clinical diagnoses used in data collection belong to categories such as "immune-related adverse events" or "adverse events of special interest," report how these terms are defined and why these categories were selected for trial reporting.
The terms "immune-related adverse events" (irAE) or "adverse events of special interest" (AEOSI) were defined in the main text or appendix. Report the scientific hypothesis for the combination or sequence on the basis of preclinical and/or clinical data as well as the rationale for the selection of the particular dose(s) and sequence of agents.
This item was not evaluated, because it is especially for clinical trials with combination or sequencing of immunotherapies.
NA NA NA NA, not applicable.

Trial Characteristic Selection and Definition
Several trial characteristics that could affect the quality score were selected.
Year of publication was directly extracted as continuous variable. Journal impact factor was referred to 2018 and classified as four groups: <5, 5 to 10, 10 to 20, and >20. Trial phase was also concerned and consisted of phase I (I, or I/II), phase II (II, or II/ III), and phase III (III or III/IV). Trials were considered as industry funded if they received any form of industry funding. Number of participating centers was calculated as three groups according to the median: 1 to 12, 13 to 246, and unknown group. Intercontinental trials were that recruited patients from more than one continent. Nonintercontinental trials were conducted in north America only, other regions (Asia only, Europe only, Oceania only), and unknown regions. The tumor types included in trials could be divided into the following four categories: pan cancer, lung cancer, melanoma, urinary system cancer, and other cancers. Based on the mechanism, immune checkpoint blocking agent in immunotherapy contained anti-CTLA-4, anti-PD-1, anti-PD-L1, and any mix of the above. According to the treatment strategy, immunotherapy could be used alone or combined with other therapy.

Statistical Analysis
The TRIOQS was calculated as the sum of the score of items in Table 1 and expressed as an integer from 0 to 11. TRIOQS scores were descripted using mean and standard error (SE). Single-item frequencies were compared between subgroups by Chi-square tests. Univariate and multivariate linear regression analyses were used to identify trial characteristics associated with higher TRIOQS. Given that it was deemed desirable to include as many characteristics associated with reporting quality as possible, the multivariable regression included all mentioned covariates. Violin plots were used to visually show the significant differences in TRIOQS among subgroups of statistically characteristics. Statistical analyses were performed using R software (http://www.R-project.org/). All tests were two-tailed, with p < 0.05 considered statistically significant.  Trials' characteristics are listed in Table 2. The number of published trials almost monotonously increased with the year. More than half trials (n = 173, 58%) were published in journals with IF >20, including Lancet Oncology (n = 50, 16

Quality Score According to TRIO Statement
The mean TRIOQS was 6.46 on an 11-point scale (range, 1 to 11; 95% CI, 6.23 to 6.69). Two hundred thirty-eight trials (79.9%) got a score of 5 to 9, while 27 trials (9.1%) have a score ≤3. Only two trials were found with a score of 11. The mean TRIOQS-E was 3.61 on a 6-point scale (range, 0-6; 95% CI, 3.45 to 3.77), with three trials having a score of 0 and 22 trials having a score of 6. The mean TRIOQS-T was 2.85 on a 5-point scale (range, 0-5; 95% CI, 2.72 to 2.98), with four trials having a score of 0 and 14 trials having a score of 5.
The most common well-reported items were response evaluation criteria (item 1, 96.0%) and toxicity grade (item 9, 98.7%), followed by Kaplan-Meier survival analyses (item 6, 80.5%). Spider or swimmer plots were presented more frequently by phase I trials (n = 74 of 124 trials, 59.7%) than by phase II (n = 47 of 104 trials, 45.2%) and phase III trials (n = 19 of 70 trials, 27.1%; p < 0.001). Criteria for continuous treatment beyond progression and definition of new adverse event terms (irAE or AEOSI), which were unique to immuneoncology therapy, were clearly described in 50% and 53.0% trials separately. However, treatment details beyond progression (item 5, 12.8%) and toxicity onset time and duration (item11, 7.7%) were poorly reported. It was worth noting that the reasons for the criteria selection were not fully explained in almost all trials.

Characteristics Associated With Reporting Quality
The results of univariable and multivariable linear regressions are listed in Table 3. Although all characteristics were statistically significant in univariable analysis, multivariate regression only revealed that higher IF (IF >20 vs. IF <5, p < 0.001), specific tumor type (lung vs. pan cancer, p = 0.018; urinary system vs. pan cancer, p = 0.021), and a certain kind of immune checkpoint blocking agent (anti-PD-1 vs. anti-CTLA-4, p < 0.001; multiagents vs. anti-CTLA-4, p < 0.001) were independent predictors of higher TRIOQS.
Specifically, articles with IF >20 had a TRIOQS on average 1.15 points higher than those with IF <5 (Figure 2A). Publications of lung cancer and urinary system cancer had a TRIOQS that was 1.08 and 1.12 point higher than those of pan cancers separately. While publications of melanoma and other cancers respectively had a TRIOQS that was 0.81 and 0.79 point potential higher than those of pan cancers ( Figure 2B). The TRIOQS of trials on anti-PD-1 agent was higher than those on anti-CTLA-4 agent by a mean of 1.17 points and trials on multiagents had an average 1.5 points higher TRIOQS than those on anti-CTLA-4 agent ( Figure 2C). Similar independent predictive characteristics were revealed for high TRIOQS-E in multivariable regression, including IF >20 (vs. IF <5, p = 0.034), phase II (vs. phase I, p = 0.021), specific cancer (lung cancer vs. pan cancer, p = 0.030; urinary system cancer vs. pan cancer, p = 0.013; other cancers vs. pan cancers, p = 0.012), certain kinds of immune checkpoint blocking agent (anti-PD-1, anti-PD-L1, and multiagents vs. anti-CTLA-4, all p < 0.001) ( Table 4). However, only IF >20 had a significant high TRIOQS-T (p < 0.001) ( Table 5).

DISCUSSION
Due to the lack of guidelines for reports of IO clinical trials until TRIO statement (10, 11) came out more than 2 years ago, the reporting quality for IO clinical trials was unsatisfactory. Concerns have been raised that more structured and transparent approach was important to the benefit-risk assessment in the evaluation of IO treatment as a new therapy. Therefore, the standardized reporting is essential. This is the first systematic evaluation of the reporting quality of IO clinical trials of cancer treatment in accordance to the TRIO statement.
Immune-oncology trials presented an unsatisfied reporting quality score based on the specific 11-item scoring system derived from the TRIO statement, which consist of six-item efficacy score and five-item toxicity score. The most common reported items in traditional clinical trials are also well described in IO trials, such as response evaluation and toxicity grade. This may be attributing the success to the well-established CONSORT guidance for clinical trials. However, treatment details beyond progression and toxicity onset time and duration were poorly reported, which are crucial to evaluate the efficacy and toxicity of immunotherapy for the cancer. TRIOQS, trial reporting of immuno-oncology quality score; CTLA-4, cytotoxic T-lymphocyte antigen 4; PD-1, programmed cell death protein 1; PD-L1, programmed death-ligand 1.
Pseudo-progression, as a unique phenomenon in IO treatment, is an event that denotes the appearance of new lesions (usually with shrinkage of baseline index tumor burden) or an initial increase in index lesions with subsequent index lesion response by clinical or radiographic assessment (15,16). Thus, IO clinical trials often allow patients to continue therapy beyond objective progression and half of publications have reported the criteria of continue treatment. However, the details of treatment after progression was seriously underreported (12.8% reported). This failure may be related to the insufficient awareness of authors on the importance of continuous therapy and may limit the ability to make a comprehensive benefit assessment.
Although specific toxicity items for IO therapy, definition of "immune-related adverse events" or "adverse events of special interest" and management of IO toxicity, were relatively well described in over half of publications, the onset and duration of IO toxicity were rare reported. It is worth noting that, unlike the traditional cancer treatment, the toxicity of IO therapy can be latency occurrence and long lasting (17,18). Therefore, reporting the onset and duration of toxicity is arguably as clinically important to assess the risk-benefit and useful to design the subsequent IO trails.
Different from that the reporting quality of clinical trials of traditional chemotherapy has been fully evaluated since the CONSORT guideline proposed, the quality of IO clinical trials reporting has not been assessed according to the TRIO statement which was specifically designed for IO trials. This divergence may be related to the insufficient awareness of differences between IO and traditional chemotherapy clinical trials and slower uptake of the TRIO statement. Continued use of CONSORT cannot fully reflect the unique characteristics of IO clinical trials (19,20). More advanced than CONSORT, TRIO adopts toxicity reporting standards at the initially proposed. Notably, only four trails have no description of toxicity. It is also interesting that more than half of trials published in journals of IF >20 (n = 173, 58%).
It is worth noting that two trial reports received the highest score of 11 points (21,22). Both reports are multicenter randomized controlled phase III clinical studies and are published in the New England Journal of Medicine in 2015. They are both immunotherapy alone studies of nivolumab and are funded by Bristol-Myers Squibb. The tumor types they studied are melanoma (CheckMate 066) and nonsmall-cell lung cancer (CheckMate 057), respectively. Although these two trial reports meet the requirement of each item according to the TRIO statement, some subcomponents are still insufficient. Neither of them mentioned the rationale for the chosen criteria used to evaluate response to therapy and for the selection of clinical diagnoses used in data collection belong to categories such as "immune-related adverse events" or "adverse events of special interest". Although they both reported the number (proportion) of patients who are treated beyond progression and efficacy after initial progression, they did not mention treatment beyond progression duration and emergence of new toxicity. In addition, the report of CheckMate 066 did not mention time of onset of IO toxicity.
Factors associated with higher reporting scores were also investigated. The publications in journals of IF >20 had higher quality score, either for efficacy assessment or toxicity assessment, which might be related to the original requirements and review system of the journal (13). Specific cancer, such as lung cancer and urinary system cancer had higher quality score compared with the pan cancer. Most of the trials designed for a specific cancer category aimed to confirm the clinical efficacy of the IO treatment for this disease, not just similar to the exploratory purpose in the pan cancer categories. Therefore, there were more detailed reports on the efficacy and the whole trial. Simultaneously, this would possibly reduce authors' interest for toxicity concerns, which lead to no difference in toxicity quality score between specific cancer and pan cancer. Compared with anti-CTLA-4 agent, trials involving other agents got a better quality score, especially for efficacy score. This is largely due to the fact that other agents came out later than anti-CTLA-4 agent, when IO clinical trials were relatively mature.
Although our study comprehensively assessed the reporting quality of IO clinical trials, the limitations should also be addressed. First, this study does not compare TRIO statement with the traditional CONSORT statement, which is mainly because that the purpose of this study is to evaluate all trials of IO rather than randomized control trails. Second, the quality score in our study was given equal weight to each item on the TRIO, which may weaken some important items or overemphasize some less-important items. At last, for those recommendations with several subcomponents, we only assign values to items, not to subcomponents. This may make the evaluation criteria broad, but it is friendly and practical for most trials.
In summary, our findings show that IO trials had an unsatisfied reporting quality score assessed by TRIO statement, especially in the reporting of treatment details beyond progression and toxicity onset time and duration. High IF journals have better reporting quality. Studies focused on specific cancer and studies containing anti-PD-1 or anti-PD-L1 agents have higher efficacy quality score. As the first step toward TRIOQS-E, trial reporting of immuno-oncology quality score-efficacy score; CTLA-4, cytotoxic T-lymphocyte antigen 4; PD-1, programmed cell death protein 1; PD-L1, programmed death-ligand 1.
providing an overall landscape of IO trials reporting quality, we are expecting that it may shed light into future improvement of IO trial reporting for the better benefit-risk assessment of immunotherapy.  TRIOQS-T, trial reporting of immuno-oncology quality score-toxicity score; CTLA-4, cytotoxic T-lymphocyte antigen 4; PD-1, programmed cell death protein 1; PD-L1, programmed death-ligand 1.