Development of the First Value Assessment Index System for Off-Label Use of Antineoplastic Agents in China: A Delphi Study

Objective To develop the first value assessment index system for off-label use of antineoplastic agents in China. Methods A modified two-round Delphi method was employed to establish consensus within a ﬁeld to reach agreement via a questionnaire or doing interview among a multidisciplinary panel of experts by collecting their feedback to inform the next round, exchanging their individual knowledge, experience, and opinions anonymously, and resolving uncertainties. Results Expert’s positive coefficient was 94.74% in the first round and 100.00% in the second round. In the first round, expert’s authority coefficient for a majority of 61 indicators was ≥ 0.80 (85.2%, ranging from 0.70 to 0.89, mean=0.84) and coefﬁcient of variation for all the 61 indicators was ≥ 22% (ranging from 11.67% to 21.74%, mean=17.4%). In two rounds, the mean expert’s authority coefficient raised to 0.85 (ranging from 0.75 to 0.90), and coefﬁcient of variation for all indicators was < 20% (ranging from 10.49% to 19.71%, mean=15.97%). The P-values of Kendall’s W test were all < 0.001 for each round. At the end of two rounds, W-value for concordance was 0.395 (χ2=347.494, P<0.0001). The final value assessment index system comprised of eight domains, 21 subdomains, and 56 indicators. The weight and combination weight of each domain were 0.4211 for therapeutic value, 0.1678 for source and type of evidence, 0.0961 for public feedback/comments, 0.0894 for novelty in drug discovery, 0.0689 for grading of evidence recommendation, 0.0578 for consistency of evidence results, 0.0561 for disease burden, and 0.0428 for ratio of composition/integration. Conclusion Use of Delphi method to develop the proposed value assessment index system was found scientific and credible. This value assessment index system is highly appropriate for off-label use of antineoplastic agents in China.


INTRODUCTION
The growing global cancer burden has accelerated the innovation in treatment, including the influx of new drugs. Nevertheless, skyrocketing healthcare costs, especially for antineoplastic agents combined with modest survival gains, raise questions that new anticancer drugs are not necessarily cost-effective (Kelly and Smith, 2014;Tefferi et al., 2015;Cohen, 2017).
Value, a relatively new, emerging and evolving term, which has eight separate but distinct definitions according to the Oxford English Dictionary (Oxford English Dictionary, ), is recognized as a multidimensional and dynamic concept with consensus, despite the fact that its definition may vary among different stakeholders, including physicians, payers, patients, etc. (Promoting Value, Affordability, and Innovation in Cancer Drug Treatment, 2018).
With promoting the use of high-value drugs, a number of organizations, including the American Society of Clinical Oncology (ASCO) (Schnipper et al., 2015;Schnipper et al., 2016), European Society for Medical Oncology (ESMO) (Cherny et al., 2015;Cherny et al., 2017), and National Comprehensive Cancer Network (NCCN) (National Comprehensive Cancer Network, ) have developed frameworks to assess antineoplastic agents either quantitatively or qualitatively, involving stakeholders (e.g., physicians, patients, and healthcare insurers). However, there is no a universally accepted framework and unfortunately no value assessment frameworks in developing countries, e.g. China, with scarce resources and rising demand for healthcare services.
Off-label use for antineoplastic agents, sometimes the only option for patients with advanced cancer in real-world settings, is an inevitable challenge and remains to be solved urgently in clinical practice. However, at present, lack of general specification and technical criteria for evaluation is tangible. Similar with "new drugs" or "new treatment" compared with a standard therapeutic regimen, it is feasible to use a value assessment framework to comprehensively evaluate off-label use for antineoplastic agents, so as to solve the technical bottleneck for evaluation of off-label use of antineoplastic agents.
Although there are compendiums of indicators for value assessment, there are currently no validated indicators to guide implementation and value evaluation of off-label use of antineoplastic agents. The aim of the present study was to explore establishment of the first value assessment index system for off-label use of antineoplastic agents in China using the modified Delphi method, which encompassed an iterative process and has been widely applied in diverse areas of healthcare system.

METHODS
To develop a value assessment index system for off-label use of antineoplastic agents in China, a modified Delphi method was employed to establish consensus within a field to reach agreement via a questionnaire or doing interview among a multidisciplinary panel of experts by collecting their feedback to inform the next round, exchanging their individual knowledge, experience and opinions anonymously, and resolving uncertainties (Hasson et al., 2000). Until consensus was reached on the final round, an agreement was identified (Iqbal and Pipon Young, 2009;Birko et al., 2015). The flowchart of the Delphi process is shown in Figure 1, and detailed description of the proposed system is presented in Appendix.

RESULTS
Totally, two rounds of consultation were carried out, and then, the consensus was reached.

Characteristics of the Experts
A multidisciplinary panel enrolled 19 participants for consultation from geographically diverse areas, including North, South, and West of China who met experts' defined criteria. Eighteen experts agreed to participate in the first and second rounds. All the experts had at least nine or more years of experience (range of experience, 9-29 years; mean, 18.2 years). Experts were predominantly (n = 17) working in hospitals and were employed in clinical pharmacy, pharmaceutical affairs, oncology, evidence-based medicine, clinical epidemiology and statistics, or pharmacoeconomics. Demographic characteristics of participants, including gender, profession, the highest level of education, etc. were also collected and shown in Table 1.

Expert's Positive Coefficient
In general, expert positive coefficient (C aj ) was 94.74% in the first round and 100.00% in the second round.

Expert's Authority Coefficient
In the first round, expert's authority coefficient (C r ) for majority of 61 indicators was ≥ 0.80 (85.2%, ranging from 0.70 to 0.89, mean=0.84), and coefficient of variation (CV) for all the 61 indicators was ≥ 22% (ranging from 11.67% to 21.74%, mean=17.4%). After two rounds, C r for the majority of indicators was higher than that of the first round. The average C r raised to 0.85 (ranging from 0.75 to 0.90), and CV for all the indicators was < 20% (ranging from 10.49% to 19.71%, mean=15.97%), indicating that consensus has been achieved. Table 2 compares values of C r between the first round and the second round.

Degree of Coordination of Experts' Opinions
P-values of Kendall's W test were all < 0.001 for each round. At the end of the second round, W-value for concordance of final indicators was 0.395 (c 2 =347.494, P<0.0001), which was statistically significant at the level of a=0.05, indicating that the consensus could be reached among the experts ( Table 3).

A Value Assessment Framework for Off-Label Use of Antineoplastic Agents
During the first round, the experts were invited to rate their opinions on 61 candidate indicators identified in the questionnaire. The second round was held to discuss answers from the first round's survey. After two rounds, a consensus of deleting five indicators, refining the expression of one indicator, and compressing indicators into three levels was achieved. Changes of indicators in two rounds were summarized in Table 4.
Consequently, after two rounds of consultation with experts, we generated an expert consensus around the final value assessment index system for off-label use of antineoplastic agents that was comprised of eight domains, 21 subdomains, and 56 indicators ( Table 5).

Main Findings
To our knowledge, this is the first Delphi-based study performed among a diverse panel of experts to develop a value assessment index system for off-label use of antineoplastic agents in China with eight domains, 21 subdomains, and 56 well-defined indicators at the end of two rounds. We believe that our study has filled an important gap on value assessment for off-label use of antineoplastic agents to address the difficulties in knowledge and practice in developing countries (e.g., China).
Although our framework was developed in the Chinese context, we believe that it can be implemented in other countries for assessing adherence to best decision-making, practice, and management in off-label use of antineoplastic agents.

Strengths and Limitations
Our study has several strengths. Firstly, the eight domains in the framework, which were strongly endorsed by the experts, disease burden, and novelty in drug discovery, covered different types of cancer burden about the state, society, and individual, and also reflected the orientation of drug research and development   policy in China. Source and type of evidence, grading of evidence recommendation, and consistency of evidence-based results highly condensed the methodological support of evidencebased medicine for the present study, and emphasized the importance of evidence and patients' decision-making. These key domains were not proposed in other value assessment frameworks, and they therefore can be used in global scale. Secondly, we invited 18 well-known experts in related fields who concentrated on clinical pharmacy, pharmaceutical affairs, oncology, evidence-based medicine, clinical epidemiology and statistics, and pharmacoeconomics, 16 of which had doctor's or master's degrees. The number of experts should be appropriately selected for a Delphi-based study (Iqbal and Pipon Young, 2009). Expert's positive coefficient was 94.74% in the first round, while that was 100.00% in the second round, indicating that experts were interested in this research, and were willing to fill out the survey within the specified timeframe. The mean expert's authority coefficient was 0.84 and that was ≥ 0.80 for the majority of 61 indicators in the first round, and then, raised to 0.85 in the second round, indicating a high degree of experts' authority in the field of value evaluation of off-label use of antineoplastic agents in the Delphi surveys and qualifying them for participation in the survey.
Thirdly, a reasonable weight setting is crucial for establishing an index system. In the present study, therapeutic value plays a leading role in value assessment index system, demonstrating that multiple forms of evidence should be taken into account for value assessment, including but not limited to patient-reported outcomes, results from randomized controlled trials(RCTs), and real-world evidence as appropriate. The weight coefficient of the first-level indicators was in the following order: therapeutic value (0.4211), source and type of evidence (0.1678), public feedback/ comments (0.0961), novelty in drug discovery (0.0894), grading of evidence recommendation (0.0689), consistency of evidencebased results (0.0578), disease burden (0.0561), and ratio of composition/integration (0.0428). Obviously, the top three  weights were therapeutic value, source and type of evidence, and public feedback/comments, which highly condensed the methodological support of evidence-based medicine for the current study and emphasized the importance of evidence and patients' decision-making (Wild et al., 2016). Fourthly, open questions were raised during each round to gain more in-depth insight into the indicator, promoting more well-defined indicators, as well as ensuing guidance for satisfactory practice, so that the index system could be more appropriate for the purpose of value assessment.
However, the present study has also a number of limitations. Firstly, we did not include potential experts from some provinces or regions across China (i.e., east of China, which could limit our results). Secondly, we did not include payers of healthcare for consultation, which are important stakeholders in the value assessment of pharmacotherapy. Thirdly, the present study did not provide a face-to-face meeting for experts to discuss disagreement. Fourthly, Delphi consensus has its own limited validity. Fifthly, there were a great number of indicators, and it is therefore necessary to remove those indicators with low operability in the future according to empirical research on different types of cancer and drugs.
Future research should aim at setting an international consensus on a value assessment index system for off-label use of antineoplastic agents using Delphi method as a contribution to robust evidence for governments' evidence-based decisionmaking, providing further insights into value and its relevance with drug prices to promote value-oriented medicine.

CONCLUSIONS
We conducted a Delphi-based method and process to develop and validate the first value assessment index system for off-label use of antineoplastic agents in China. The final 56 indicators need to be further tested, verified, and revised in clinical practice.

Producing an Evidence-Based Candidate Indicators List
According to the comprehensive overview of global value assessment tools for drugs, a total of 12 eligible value assessment tools for drugs were identified (Jiang et al., 2019). They covered basic characteristics, key elements, and techniques in terms of value domains and metrics, evidence source/grading, development process, in which a detailed value assessment index system was presented and grouped into three levels, including eight domains, 22 subdomains, and 61 indicators. The eight domains were as follows: i) disease burden, ii) therapeutic value, iii) novelty in drug discovery, iv) source and type of evidence, v) grading of evidence recommendation, vi) consistency of evidence-based results, vii) ratio of composition/integration, and viii) public feedback/ comments ( Figure 2).

Recruitment of Experts in a Multidisciplinary Panel
A purposive, criterion-based sampling approach was adopted to convene a multidisciplinary panel of experts, providing detailed explanation and objectives for our study to promote the acceptance of the final index system. With random sampling of mathematical statistics theory, the relationship between the mean sample standard deviation s and the population standard deviation s can be formulated as follows: where, m represents the number of experts, and m increases, while s decreases. Typically, a panel of 4~16 experts can bring out satisfying results, while for those relatively important issues, such as indicator design or weight distribution, 15~30 experts need to be considered to create diversity regarding representation (Akins et al., 2005;Sema and Rafa, 2012). Delphi does not use random sampling to recruit a panel of experts, in contrast to conventional surveys, which generally hold an aim of representativeness (von der Gracht, 2012; Khatwania and Karb, 2017).

Materials and Consultation
Each round of questionnaire was delivered initially via e-mail and later by telephone. Experts were questioned about the importance, operability, and sensitivity, and then, ranked each indicator on a Likert-type scale (Akins et al., 2005) from 1point (extremely inappropriate) to 5-point (extremely appropriate). Meanwhile, familiarity and judgment scores were recorded. Familiarity was divided into a Likert-type scale (Khatwania and Karb, 2017) where 1-point indicated that the expert is highly unfamiliar with the indicator and 5point denoted that the expert is highly familiar with the indicator. Judgment criteria included four aspects: work experience, theoretical analysis, understanding from domestic and foreign counterparts, a nd intuition demonstrating the degree of influence, with scores of 1~3 points ( Supplementary Tables 1 and 2). Besides, open questions through each round were allowed, so as to encourage experts to revise, delete or add indicators, which they perceived to be necessary for the index system prior to the following survey round. The indicators were adjusted and supplemented according to the experts' comments. Then, the questionnaire was modified following the qualitative feedback and statistical analysis from the last round to the next. The final consultation with consensus achieved could lead to the final index system. There is little evidence on the optimal number of Delphi rounds. Consensus is expected to increase with each additional round. However, potential bias also increases with experts' fatigue or attrition. Therefore, mean scores ≥ 0.70 and CV ≤ 25% were set as the consensus level in the present study (Monguet et al., 2017).

Statistical Analysis
The expert consultation database was established through Epi Data (version 3.1), exported into Excel 2016 spreadsheets, and all statistical analyses were carried out by SPSS 25.0 software (IBM, Armonk, NY, USA). Both descriptive statistics and quantitative analyses were undertaken. Each round of the Delphi survey was analyzed separately. A two-tailed P-value < 0.05 was considered statistically significant. Descriptive information about experts' gender, level of education, professional title, etc. was recorded.
The mean scores and CV were calculated for each indicator and round. CV ≤ 25% indicated less variability of the experts' opinions.
Expert's positivity coefficient (C aj ) and expert's authority coefficient (C r ), involving two factors (the judgment criteria for the indicators (C a ) and the experts' familiarity with the indicators (C s )), were correlated together as follows: C r =(C a +C s )/2. P-values of Kendall's W test and W-values were determined to evaluate the expert's positive degree, expert's authority degree, and degree of coordination of experts' opinions (Khatwania and Karb, 2017). Mean scores ≥ 0.70 and CV ≤ 25% were recommended for Delphi studies and set as the consensus level (von der Gracht, 2012), and results that failed to meet either of the abovementioned criteria indicated that no consensus could be achieved.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
QJ: study design, drafting out and revising the manuscript critically for important intellectual content. WZ: study design, analysis and method directing. JY: participated in its design, revising the manuscript. HL: analysis and interpretation of data. MM: analysis and interpretation of data. YL: conceiving of the study, revising the manuscript finally.