Review of Statistical Methodologies for Detecting Drug–Drug Interactions Using Spontaneous Reporting Systems

Concomitant use of multiple drugs for therapeutic purposes is known as “polypharmacy situations,” which has been recognized as an important social problem recently. In polypharmacy situations, each drug not only induces adverse events (AEs) but also increases the risk of AEs due to drug–drug interactions (DDIs). The proportion of AEs caused by DDIs is estimated to be around 30% of unexpected AEs. The randomized clinical trials in pre-marketing typically focus emphasis on the verification of single drug safety and efficacy rather than the surveys of DDI, and therefore, patients on multiple drugs are usually excluded. However, unlike pre-marketing randomized clinical trials, in clinical practice (= post marketing), many patients use multiple drugs. The spontaneous reporting system is one of the significant sources drug safety surveillance in post-marketing. Commonly, signals of potential drug-induced AEs detected from this source are validated in real-world settings. Recently, not only methodological studies on signal detection of “single” drug, but also on several methodological studies on signal detection of DDIs have been conducted. On the other hand, there are few articles that systematically summarize the statistical methodology for signal detection of DDIs. Therefore, this article reviews the studies on the latest statistical methodologies from classical methodologies for signal detection of DDIs using spontaneous reporting system. This article describes how to calculate for each detection method and the major findings from the published literatures about DDIs. Finally, this article presented several limitations related to the currently used methodologies for signal detection of DDIs and suggestions for further studies.


INTRODUCTION
For safety surveillance of a drug, several data-mining algorithms are used to detect quantitative signals from spontaneous reporting systems. The data-mining algorithms include the frequency statistical models are the proportional reporting ratio (PRR) (Evans et al., 2001) and the reporting odds ratio (ROR) (van, Puijenbroek et al. 2002), and the Bayesian statistical models are [the information component (IC) as the Bayesian Confidence Propagation Neural Network (BCPNN) (Bate et al., 1998) and the gamma-Poisson shrinker (GPS) (Szarfman et al., 2002) used as the empirical Bayes geometric mean (EBGM)].
Although, the recent extension of the IC and the GPS can accommodate signals of high-order interactions (Almenoff et al., 2003;Yang and Fram, 2004;Norén et al., 2006;DuMouchel and Harpaz, 2012), generally, the PRR and the ROR are exploited for early signal detection of unknown "single" drug-induced adverse events (AEs). And these detection models might detect potential drug-induced AEs that could not be found clinical trials of premarketing using spontaneous reporting systems including postmarketing data.
The randomized clinical trials in pre-marketing typically focus emphasis on the verification of single drug safety and efficacy rather than the surveys of drug-drug interactions (DDIs), and therefore patients on multiple drugs are usually excluded from the clinical trial. However, unlike pre-marketing randomized clinical trials, in clinical practice (= post marketing), many patients use multiple drugs, as in polypharmacy situations.
Concomitant use of multiple drugs can affect the biological action of the related drugs. The main types of DDIs include pharmacokinetic and pharmacodynamic interactions (Aronson, 2004). Of them, the pharmacokinetic interactions might affect the metabolism of drug that determine bioavailability. On the other hand, there is no change in blood levels of drugs in the pharmacodynamic interactions, which can occur either competitively or non-competitively at the pharmacological receptor level.
In concomitant use of multiple drugs, each drug not only induces AEs but also increases the risk of AEs due to DDIs. The proportion of AEs caused by DDI has been estimated to be around 30% of unexpected AEs (Pirmohamed and Orme, 1998).
Adverse events caused by DDIs can also be prevented if discovered early like single drug-induced AEs, and it is practically difficult to examine the interactions of all drug combinations in the pre-marketing stage (Banda et al., 2016). Therefore, postmarketing surveys will help early detection of unknown AEs not only caused by single drug but also DDIs.
Recently, several methodological studies on signal detection of DDIs have been conducted. Herein, we review studies on the statistical methodologies for signal detection of DDIs using spontaneous reporting systems.

STATISTICAL MeTHODOLOGY
Logistic Regression Model van Puijenbroek et al. proposed a statistical method using the logistic regression model for detecting signals of DDIs from a spontaneous reporting system (van, Puijenbroek et al. 1999;van, Puijenbroek et al. 2002).
The ROR is a statistical model similar to odds ratio (van, Puijenbroek et al. 2002), and using the logistic regression model shown in Eq. 1, the ROR adjusted for age, gender, and concomitant drugs (drug D 1 and drug D 2 ) is used as the adjusted ROR.
where, a = age, G = gender, x 1 = drug D 1 , x 2 = drug D 2 , and x 1 x 2 = the concomitant use of drug D 1 and drug D 2 .
In their first study, the authors showed that concomitant use of oral contraceptives and the antifungal itraconazole resulted in the occurrence of withdrawal bleeding. In the second study, the authors showed that the efficacy of diuretics decreased with the concomitant use of diuretics and non-steroidal antiinflammatory drugs, resulting in worsening of congestive heart failure (van, Puijenbroek et al., 1999).
Signal detection using the logistic regression model has some limitations (e.g., ignoring dependencies/associations between AEs and regression analysis of more than 10,000 drugs as included in a spontaneous reporting system).
To overcome the limitations of the classical logistic regression model, a new statistical model; the Bayesian logistic regression model, which extended the logistic regression model corresponding to data of very large dimensions, was proposed. The Bayesian logistic regression model can perform regression analysis using millions of predictors contained in a spontaneous reporting system. (Genkin et al., 2007).
Using the Bayesian logistic regression model, Caster et al. also addressed masking effect (cf. Limitation) that affects background reporting of AEs  and confounding caused by the concomitant use of multiple drugs (Caster et al., 2010).

Multi-Item Gamma-Poisson Shrinker Model
The multi-item gamma-Poisson shrinker (MGPS) model is currently used by the US Food and Drug Administration (FDA) and is a statistical model that extended the GPS model for detecting signals of potential DDIs (Almenoff et al., 2003;Yang and Fram, 2004).
The MGPS model can calculate the score of "Drug-Drug-Event" or "Drug-Event-Event" (including that of with higherorder interactions such as "Drug-Drug-Drug-Event" or "Drug-Drug-Event-Event"). Moreover, the MGPS model can be applied to the itemsets of size 3 or more, but as the number of items increases, the calculation amount explosively increases.
In the MGPS model, Excess2 is used an indicator value. The signal detection threshold value is not set, and as the value of Excess2 is relatively large, the influence of interaction caused by concomitant drugs is predominantly suspected.
For an arbitrary itemset, it is desirable to estimate the expectation λ = E [N/E]. Where, N is the observed frequency of the itemset (= number of reports) and E is the count predicted from an assumption that items are independent, that is, the baseline count.
The observed frequency of itemset is defined by i, j, k,…, as N i , N j , N k …, E and other variables are defined as subscript letters as well as N. For example, E ij is the baseline prediction for the number of involving items i and j.
As a common model, baseline counts are calculated based on the assumption of within-stratum independence. E calculated under this assumption is often expressed as E0.
If all reports are assigned to the strata denoted by s = 1, 2,…, S, the proportion of reports in stratum s that contain the item i is expressed by P i s , and the total number of reports in stratum s is expressed by n s .
Here, the frequency of baseline for triple itemset (i, j, and k) is defined under independence as: For itemsets of size 3 or more, an "all-2-factor" loglinear model can be defined as the frequency E2 for the itemsets that match all the estimated pairwise two-way marginal frequencies but contain no high-order dependencies.
For itemsets of size 3 (e.g., DDI: drug D 1 and drug D 2 , and AE), the estimated frequency of the all-2-factor loglinear model can be defined as the frequency E2 prediction by simple subtraction is compared.
For example: The parameter λ is estimated by the geometric means, denoted as EBGM, of their empirical Bayes posterior distributions.
Detecting the signals of DDIs using the MGPS model is based on the EBGM value of the two drugs and the lower of the 90% confidence interval (CI) being larger than the upper of the 90% CI estimates for each of the two drugs.
Example, in one of the reports the signals of potential DDIs detected using the MGPS model is the AE profile of verapamil (the calcium channel blocker) and the combination of three classes of cardiovascular drugs (Almenoff et al., 2003).
This result revealed that the MGPS model for disproportionality measure is a promising statistical model for detecting signals of potential DDIs in polypharmacy situations.

Regression-Adjusted Gamma-Poisson Shrinkage Model
The GPS model proposed by DuMouchel is worse than the logistic regression model (Harpaz et al., 2013). However, unlike the GPS model, signal detection using t-tests in logistic regression models is not suitable for small samples such as rare AEs (DuMouchel and Pregibon, 2001 On the contrary, the major difference between the RGPS model and the MGPS model is that the MGPS model do not consider the effects for polypharmacy, and thus may lead to the underestimation of disproportionality estimate for the drug of interest. In addition, the RGPS model can handle this question. Additionally, the values of the adjusted expected value (E) in the RGPS model is not calculated by standard logistic regression but instead the extended logistic regression.
It is recommended to replace EBGM as the posterior geometric mean with the empirical Bayes relative reporting ratio (EBRRR) as the posterior mean in the RGPS model.
For each response, the (N j , E j ) pairs from the previous step are input into a gamma-Poisson shrinkage algorithm. The prior distributions are assumed to be simple gamma distributions rather than a mixture of two gamma distributions as is done in the MGPS model. Specifically, a two-parameter gamma Poisson model is used to produce shrinkage estimates, where the prior distribution of the relative reporting ratios is assumed to be Gamma (γ, δ) and where the (N j , E j ) pairs are used to estimate the hyperparameters γ and δ. The posterior mean of a drug relative reporting ratio is then EBRRR j = (N j + γ)/(E j + δ), and RRR05 and RRR95 are computed using the appropriate gamma distribution Gamma(N j + γ, E j + δ) (DuMouchel and Harpaz, 2012).
In the RGPS model of DDIs, n jk is defined as the number of reports including both drug j and drug k , and N jk is defined as the number of reports related to expected AEs. Then, EBRRR j and EBRRR k are defined as the corresponding disproportionality estimates for the two drugs in report i.
where P α is the function that links the linear predictor μ i to the probability scale and β j and β 0g(i) are the estimated coefficients for the drugs and intercepts, where the intercept depends on which grouped-stratum g(i) report i belong to. Additionally, Let X ij = 1 if drug j is included in report i, X ij = 0 otherwise, and let N j be the number of events reported with drug j. E jk is defined as the expected value (E) of N jk under the null hypothesis that both drug j and drug k have no effect of the RRR.
where, β j or β k is considered as 0 if the suspected drug was not in the logistic regression model.
"No interaction" indicates that the disproportionality measure for both the drugs (= N jk /E jk ) is expected to be higher for the EBRRR j and EBRRR k . Therefore, the no-interaction expected count is defined as follows: There will be J int (J int -1)/2 raw interaction ratio (INTRR jk ) of the form: DuMouchel et al. proposed a method to use one-parameter prior gamma distribution (γ 1 , γ 1 ), of mean 1, as a model for the mean of INTRR jk , and estimate γ 1 by inputting the set (N jk, E * jk ) into the empirical Bayes estimation.
As a result, the posterior mean of the interaction ratio is expressed as follows: The posterior 5% limit (INTEB 05jk ) and posterior 95% limit (INTEB 95jk ) are the corresponding quantiles of the gamma distribution (N jk + γ 1 , E * jk + γ 1  Harpaz (2012).
If the INTEB is very low, it has not yet been completely verified whether it are the signals of potential DDIs. However, because such results are often obtained, the further verification will be necessary.

extended Information Component Model
The IC is a measure of association of pairs of drug and AEs only, but there is often an interest in high-order interactions as DDIs (= itemsets of size 3).
Although, an extension of the IC to 3rd-order associations including 3 itemset as DDIs was proposed by Orre et al. (2000), the proposed method did not compensate for pairwise associations. Therefore, Norén et al. (2006) proposed the following definition for the extended IC model: where, As with simple algebraic operations, IC xyz can be re-expressed as follows: Although arbitrarily accurate estimates for the posterior mean of IC distribution can be used (Koski and Orre, 1998), the maximum a posteriori (m.à.p.) estimates can be used for central estimates instead, because IC distribution is generally unimodal.
There are three main advantages of the m.à.p. estimate. First, it is well suited for use in stratified IC. Second, it has the intuitive property of being equal to 0 when the estimated joint probability is equal to the product of the estimated marginal probabilities. Third, the concept of most likely value for an unknown parameter is perhaps more natural than that of the expected value.
These are important aspects in drug safety applications, and the results must be interpretable not only by statisticians but also by non-statisticians such as medical professionals. Norén et al. (2006) proposed the following m.à.p. estimate. Most of the theory developed for the pairwise IC model (IC m.à.p. model) holds approximately the IC model for higher order.
In one of the reports, the signals of potential DDIs detected using the extended IC model was terfenadine and ketoconazoleinduced ventricular fibrillation. There were five reports of ventricular fibrillation due to the combination of terfenadine and ketoconazole in the VigiBase ® as a spontaneous reporting system, and the extended IC (IC xyz m.à.p. ) value is 2.40 with the lower of the 95% CI of 1.08 (Norén et al., 2006).

Ω Shrinkage Measure Model
The Ω shrinkage measure model was proposed to calculate the observed-to-expected ratio as a spontaneous reporting system for detecting the signals of potential DDIs (Norén et al., 2008). Norén et al. criticized the logistic regression model in missing out on several signals that strongly suggestive of potential DDIs, additionally, they demonstrated that after conducting comparative studies using the World Health Organization database, the Ω shrinkage measure model is a refined method compared to the logistic regression model.
For the Ω shrinkage measure model, the observed reporting ratio was f 11 of AE caused by concomitant use of 2 drugs: drug D 1 and drug D 2 , in addition, its expected value was E[f 11 ].
where, n is the number of reports shown in Figure 1. For example, n 111 is the number of reported target AEs caused by drug D 1 and drug D 2 .
E[f 11 ] is unknown. However, f 11 can be compared with the estimator g 11 of E[f 11 ], g 11 is given as follows: When f 10 < f 00 (which denote no risk of AE caused by drug D 1 ), the most sensible estimator g11 = max (f 00 , f 01 ) is yielded and the vice versa when f 01 < f 00 .
Norén et al. defined a non-shrinkage measure for detecting AEs caused by drug D 1 and drug D 2 as follows: However, since the occurrence of AE is rare, g 11 might show very small, and therefore, Ω 0 is sensitive to spurious relationship and tends to falsely detect a signal. This is a well-known phenomenon in screening pairwise drug-AE excessive reporting rates in a spontaneous reporting system, and shrinkage has been proven to be an effective approach in reducing the sensitivity to random fluctuations in disproportionality measures based on rare cases. The models such as the BCPNN and EBGM also used pairwise measures of disproportionality as shrinkage measures.
To construct a similar shrinkage measure from Eq. 14, Norén et al. re-expressed the observed and expected RRR f 11 and g 11 in terms of the observed number of reports n 111 and expected numbers of reports E 111 = g 11 ×n 11+ , respectively: and proposed the Ω shrinkage measure: α is the tuning parameter that determines the shrinkage strength. When α = 0, Ω = Ω 0 . The effect of α is equivalent to that of α additional expected reports, and exactly matches the increase in the observed number.
Shrinkage regression can be set as the value of tuning parameter based on cross-validation estimates for classifier performance. However, in a disproportionality analysis, there is no objective basis for selecting a particular value for α. Therefore, in the Ω shrinkage measure model, α = 0.5 was set to provide sufficient shrinkage for avoiding disproportional highlighting based on rare reports.
In the frequentist method, Ω differs slightly from Ω 0 for large n 111 and E 111 , and the variance of Ω 0 is given as follows: Var Ω Var Using Eq. 17, the lower of the 95% CI for Ω can be estimated using the following equation: where, ϕ(0.975) is 97.5% of the standard normal distribution. On the contrary, in Bayesian method, the exact CI for µ can be obtained as solutions to the following equation, for appropriate posterior quantiles µ q : where, α is the tuning parameter. n 111 and E 111 are the number of reported target AEs caused by drug D 1 and drug D 2 and their expected values.
Here, the logarithm of the solution to Eq. 19 for q = 0.025 and 0.975 provides Ω 025 (the lower limits of 95% CI) and Ω 075 (the upper limits of 95% CI), respectively.
In both frequentist and Bayesian methods, Ω 025 > 0 is used as a threshold for detecting the signals of the concomitant use with drug D 1 and drug D 2 .
Qian et al. built a computerized system in which data acquisition and placement are automated. The signals of potential DDIs were then detected using this system. (Qian et al., 2010). This study detected the signals of potential DDIs using three different models; the Ω shrinkage measure model, the logistic regression model (cf. Logistic Regression Model), and the additive model and multiplicative models FIGURe 1 | Four-by-two contingency table for the evaluation of drug-drug interaction.
FIGURe 2 | Two-by-two contingency table for the evaluation of drug-drug interaction.
Frontiers in Pharmacology | www.frontiersin.org November 2019 | Volume 10 | Article 1319 (cf. Additive and Multiplicative Models). A comparison of signals detected using the three models revealed that the signals of potential DDIs detected on average by at least two models could reflect the fact that the 3 models are highly correlated (Qian et al., 2010).

Additive Model
In the additive model, if the risk associated with drug D 1 without drug D 2 is the same as the risk associated with drug D 1 and drug D 2 together, then there is no signal of DDI. In other words, there are potential DDIs if the combination risk is high compared to what is expected based on the individual drug: This equality implies (RD: risk difference): That is, when RD drug D1∩drug D2 -RD only drug D1 + RD only drug D2 > 0 (p 11 − p 10 − p 01 + p 00 > 0), the signal of the additive model is detected.

Multiplicative Model
In the multiplicative model, under the assumption that the null hypothesis is true (i.e., no interaction), the proportion of an AE associated with the concomitant use of drug D 1 and drug D 2 is the same as the proportional risks of individual drugs in the absence of either drug D 1 or drug D 2 .
This equality implies:   where, x 1 = drug D 1 , x 2 = drug D 2 , x 1 x 2 = the concomitant use of drug D 1 and drug D 2 . Thakrar et al. (2007) showed that the additive model has higher sensitivity than that of the multiplicative model in detecting the signals of potential DDIs. Therefore, Noguchi et

Combination Risk Ratio Model
To estimate the degree of potential safety risk in combination, Susuta and Takahashi (2014) proposed a risk assessment method for combined use of drugs at a frequency where two or more drugs are reported simultaneously, assuming that the possibility of drug interaction is a combined risk in the occurrence of AEs.
The concomitant use risk was determined when the ratio between the concomitant use indicator and the indicator (e.g., PRR, ROR) obtained separately for both agents exceeded 2. The following is an expression using the PRR as the indicator.
The formula for calculating PRR and χ 2 is as follows: Additionally, to calculate the PRR and the χ 2 of drug D 1 ∩ drugD 2 , drug D 1 and drug D 2 , replace it as follows.
To check the validity of the combination risk ratio model, the reports of Stevens-Jonson syndrome (SJS) or toxic epidermal necrolysis caused by the DDIs were analyzed using the Japanese Adverse Drug Event Report database.
As for the concomitant use of suspected drugs, which fulfill the situations of concomitant use risk, SJS: 10 candidates out of 159 combinations and toxic epidermal necrolysis: 22 candidates out of 111 combinations were detected.
In addition, this method proposed by Susuta et al. has been used to search for the DDIs related to the concomitant use of angiotensin receptor blockers and thiazide diuretics combination therapy by Noguchi et al. (2015) and for detecting signals of the concomitant use of deferasirox with other drugs by Mizuno et al. (2016) in Japan. Gosho et al. (2017) proposed the chi-square statistics model for detecting the signals of potential DDIs.

Chi-Square Statistics Model
First, they developed the following measure (χ 0 ) to estimate the discrepancy between the observed and expected numbers of AEs with drug combinations: The expected number of AEs (E 111 ) can be estimated using E 111 = g 11 ·n 11·, presented in Ω Shrinkage Measure Model. The measure χ N , which is the square root of the chi-square test statistic, is based on the normal approximation of the Poisson model, and therefore, χ N is not suitable for the evaluation of rare events. Thus, when evaluating rare events, it is generally considered more appropriate to use the chi-square test with Yate's correction than the standard chi-square test (Yates, 1934), hence, χ was also corrected with the correction term "0.5" based on the chi-square test with Yate's correction: Gosho et al. (2017) set χ > 2 and χ > 2.6 as thresholds for detecting the signals of AEs caused by DDIs in a simulation study. These cutoff values are specified based on 95% and 99% of chi-square distribution with one degree of freedom. According to this simulation study, with the criterion: χ > 2, false positives are controlled within acceptable ranges, additionally the chi-square statistics model showed higher sensitivity and AUC than those of both frequentist and Bayesian methods of the Ω shrinkage measure model (Gosho et al., 2017).
Similar to the Ω shrinkage measure model, the detection of signal using the chi-square statistics model is designed to focus on the detection of synergistic rather than antagonism among some DDIs. Gosho (2018) used the chi-square statistics model and he Ω shrinkage measure model to examine the clinical drug-drug interactions that cause hypoglycemia and rhabdomyolysis (Gosho, 2019).

Association Rule Mining Model
To comprehensively search for the signals of potential DDIs, if a calculation using the conventional methods that simply create combinations from a large database such as a spontaneous reporting system is used, the considered number of the concomitant use would be enormous. Therefore, it would be difficult to detect the signals of potential DDIs at an early stage.
Contrarily, the association rule mining model is frequently used to find interesting combinations hidden in large databases, and not just medical databases. In the association rule mining model, the "a priori algorithm" can be used to reduce the number of calculations (Agrawal et al., 1993;Agrawal and Srikant,1994).
If the association rule mining model was used, it is unnecessary to calculate indicators for all combinations of the concomitant use, as the previous models.
An indicator of a general association rule model is shown below. Among the transaction T as a set of items, an association rule can be expressed as the antecedent of rule X → the consequent of rule Y; where, X and Y are mutually exclusive sets of items.
There are several indicators of the association rule mining model. First, the support is defined as the percentage of all items in both X and Y to transaction T in the data. That is, how frequently the rules (X → Y) occur within transaction T. The support is as follows: Second, the confidence is the conditional probability P(Y|X), and measures the reliability of the interference made by the rules (X → Y). The confidence is as follows: Third, the lift of an association rule represents the ratio of probability. It is the ratio between the confidence of the rule and the support of the itemset in the consequent of the rule. The lift is as follows: If the lift is > 1, it shows the degree to which those two occurrences depend on each other. Therefore, the lift is often used frequently to assess the interest of a rule.
Finally, the conviction of an association rule can be interpreted as the ratio of the expected frequency that X occurs without Y if X and Y are independent and divided by the observed frequency of incorrect predictions. The conviction is as follows: In the lift, even if X and Y are interchanged, the value of the indicator is the same. On the contrary, in the conviction, when X and Y are interchanged, the value of the indicator is different. This indicates that the lift cannot be evaluated correctly if Y is actually the antecedent of rule and X is actually the consequent of rule, and the conviction can be also evaluated correctly in such a situation.
So far, we have introduced four indicators that are particularly commonly used in the association rule mining model. Next, three search models of the signals of potential DDIs using these indicators are shown using Figure 3. Shirakuni et al. (2009) examined the combined use and discrete use of 2 drugs using association rule mining model.

Shirakuni's Method of Association Rule Mining Model
In the combined use of two drugs model, the antecedent of rule X was defined as drugs D 1 and D 2 , and the consequent of rule Y was defined as the target AE (AE).
In the discrete use of 2 drugs model, the antecedent of rule X was defined as drugs D 1 (or 2) -induced AE, and the consequent of rule Y was defined as drugs D 2 (or 1) . In this rule, both hypotheses and conclusions are relevant to the AE, and therefore, signals can be detected from drugs D 1 and drugs D 2 individually.
The support and confidence of each drug is calculated for both drugs D 1 and drugs D 2 based on the cases of patients presenting AEs included in the dataset.
Kubota purposed that because the PRR show the generation ratio of AEs, the result is evaluated regardless of sample size and χ 2 is important when examining the total sample size (Kubota, 2001). Therefore, the drugs with high log PRR and log χ 2 values are considered to have a strong signal.
To evaluate Shirakuni's method, the signal score obtained by adding the log PRR and log χ 2 was used as the strength of the signal. This signal score is also used to compare signals for sex and age differences (Noguchi et al., 2018b;Noguchi et al., 2018c).
The FDA Adverse Event Reporting System dataset had sufficient information to apply the association rule mining model. In the association rule mining model, high indicators of the support and confidence are generally evaluated as a strong relationship. Next, Shirakuni et al. (2009) compared each signal score of the SJS caused by DDIs with the results of the association rule mining model to evaluate the performance of the proposed model.
In this result, the correlation between "discrete use of 2 drugs" and the signal score was weaker than that of "combined use of 2 drugs. " Therefore, it was concluded that, among the two methods of the association rule mining model proposed by Shirakuni et al. (2009), the method focused on "combined use of 2 drugs" detected such important signals at an early stage.

Harpaz's Method of Association Rule Mining Model
In Harpaz's method , like the combined use of two drugs model suggested by Shirakuni et al. (Shirakuni et al., 2009), the antecedent of rule X was defined as drugs D 1 and D 2 , and the consequent of rule Y was defined as the AE.
However, in the association rule mining model, it is sometimes inappropriate to evaluate using the confidence value. For example, frequently reported AEs (e.g., nausea) produce large confidence values regardless of the drug associated with AEs. Whereas, rarely reported AEs may produce small confidence values, although AEs are strongly associated with certain drugs.
Therefore, in Harpaz's method, the RRR was used instead of confidence as the second parameter to qualify the worthiness or strength of an association rule .
The RRR is defined as the ratio of the observation frequency of the rule to the prediction frequency of the baseline, and is shown as Eq. 43.
The other disproportionality analysis methods are based on the RRR, namely the BCPNN and the EBGM in the signal detection of a single drug.

RRR
Observed Expected confidence drug D and dru = = N is the total number of records in the data. Extrapolating from Harpaz's evaluation sample, the full set of potential DDIs identified by the method can be described by the taxonomy and proportions shown below.
Drugs are divided into the following three categories; (1) drugs known to be administered together or treat the same indication: 57%; (2) drugs with the same active ingredient: 2%; and (3) supposedly unrelated drugs: 41%.
AEs are divided into the following four categories: (1) one of the drugs is known to cause effect: 22%; (2) all drugs are known to cause effect: 21%; (3) none of the drugs is known to cause effect: 27%; and (4) confounded association, where drugs are administered to treat the AE: 30%.
In evaluations using Harpaz's method, the results demonstrate that a significant number of DDIs can be identified. Additionally, the very low p-value indicates that it is extremely unlikely that Harpaz's method detected them just by chance, and thus is a valid statistical model for signal detection.

Noguchi's Method of Association Rule Mining Model
We proposed Noguchi's method using the association rule mining model (Noguchi et al., 2018a). In Noguchi's method, the antecedent of rule X was defined as drug D 2 (or 1) and the consequent of rule Y was defined as drug D 1 (or 2) -induced AE. That is, Noguchi's method focuses on how much additional drug D 2 (or 1) contributes to drug D 1 (or 2) -induced AE.
The lift according to this model indicates that the presence of drug D 2 (or 1) influences the probability of drug D 1 (or 2) -induced AE. Furthermore, in this method, it was confirmed by conviction that the DDIs obtained are not a false prediction.
In the study by Noguchi et al., lift of >1 and conviction of >1 were used as the criterion for detection using the association rule mining model. As the risk data for verification was created by the combination risk ratio model presented in Section 2.6, there is no combination of n < 3 in the risk data for verification. Therefore, in the verification, the combination of n 111 < 3 was excluded from the signal and n 111 ≥ 3 was added to the criterion for detection.
In Noguchi's method, to compare the detection power, all combinations of DDIs were calculated using the association rule mining model. Therefore, it has not been determined how much computation time could be reduced compared to the previous methods using the a priori algorithm.
However, given the number of drugs registered in the spontaneous reporting systems, there are several potential combinations of DDIs. As Noguchi's method simplifies the computation, it is expected that the time for signal detection will be reduced as well as statistical models using other association rule mining model in actual search.
The association rule mining model is easy to extend to higherorder interactions. However, among the three methods presented in this review, the gold-standard has not been determined.
The chi-squared statistics is useful to determine the statistical significance level. Alvarez showed that chi-square statistics can be calculated directly using confidence, support, and lift with Eq. 46 (Alvarez, 2003).
The chi-squared statistics make it easy to validate combinations obtained using the standard association rule mining model (e.g., Shirakuni's method and Noguchi's method), and can identify statistically significant signals of DDIs that might be false positives.

Causal Association Rule Discovery Model
As described in Association Rule Mining Model, the association rule mining model is often used to discover the signals of potential DDIs in the spontaneous reporting system. However, the main limitation of the traditional association rule mining model is that the strength of signals is measured based on correlation, not causality.
Several studies have been reported on the concept of causality, such as inductive causality models (Pearl, 2000), causal Bayesian network based methods (Spirtes et al., 2001), an additive noise model (Hoyer et al., 2008), and a hybrid approach (Cai et al., 2013), however causal discovery on high-order and sparse data of DDIs is still unsolved.
To solve this problem, instead of reconstructing a causal Bayesian network, Cai et al. proposed the causal association rule discovery (CARD) model with the aim to detect the true causal relationship between the concomitant use of two drugs and AEs (Cai et al., 2017).
For the rule X → Y with X ≥ 3, any sub-rules containing two antecedents must also form the V-structure with the AE: drugs D 1 → AE ← drugs D 2 (e.g., aspirin → Bleeding ← warfarin).
Because the interesting of rule X (drugs D 1 , drugs D 2 ) → Y (AE) is dependent on the weakness of its sub-rules, and the causal association interesting measure (CAIM) is defined as follows: The dominance of the CARD model was determined by physician assessment of 100 randomly selected higher-order associations detected using the CARD model and Harpaz's method of association rule mining model (cf. Harpaz's Method of Association Rule Mining Model) . In the identification of known DDI, the CARD model was more accurate than Harpaz's method: CARD model (20%) vs. Harpaz's method (10%). Furthermore, in the CARD model, the detection of unknown combinations is less than Harpaz's method: CARD model (50%) vs. Harpaz's method (79%) (Cai et al., 2017).

LIMITATION
The spontaneous reporting systems used in these studies are based on clinical trials and post-marketing spontaneous reports, so only AEs observed are registered, and their causal relationship is unclear. Therefore, the cases may be underreported. Furthermore, the number of reports and signal values are influenced by various factors. Although not necessarily apparent, the number of cases increases in the first 2 years post-marketing and then begins to decrease. This is known as the Weber effect (Weber, 1984;Hartnell and Wilson, 2004).
The number and score of signals also possibly fluctuate during several years after launching (Hochberg et al., 2009). After drug-induced AE is highlighted, the number of reports may generally be accelerated. This is known as the notoriety effect (Pariente et al., 2007).
Additionally, the reports of drugs in the same class to those reported may also be accelerated. This is known as the ripple effect (Pariente et al., 2007).
The signal may be underestimated by numerous reports and that the same AE is associated with other drugs. This is called the masking effect or cloaking effect . Matsuda et al. (2015) clarified that factors related to druginduced AEs reporting attitudes in Japan may be different from those in other countries due to the involvement of medical representatives early post-marketing phase vigilance as a part of Japanese unique system of surveillance and the voluntary reporting process.
Thus, the spontaneous reporting systems are affected by several reporting biases and the state of the country's survey. Furthermore, the report rates of AEs vary from year to year, and the value of the signal can easily vary with the timing of the survey.
In addition to the general limitations of study using the spontaneous reporting systems, the research of DDIs has some unique limitations.
In the surveillance for of DDIs, the lack of information about one of the two drugs will overestimate the RRR of drug-induced AEs, when either drug is used alone (Norén et al., 2008). This is a serious problem in evaluating the AEs of DDIs, because it leads to under-reporting of n 111 and over-reporting of n 101 or n 011 (Figure 1). Furthermore, some of these statistical models do not apply to interactions with three or more drugs.
Finally, these statistical models are designed to focus on the detection of synergism rather than antagonism among some interaction of DDIs.

CONCLUSIONS AND PeRSPeCTIveS
In this review, we have discussed statistical methodologies for signal detection of DDIs in spontaneous reporting systems. To the best of our knowledge, this is the latest review including recently proposed statistical methodologies.
The bivariate disproportionality analysis (e.g., single druginduced AE) represents the bulk of daily routine of PhV. However, as the use of multiple drugs becomes more common, the problems of AEs due to DDIs cannot be ignored. Therefore, in the future operations of PhV, it is important to detect signals of unknown DDIs at an early stage.
In the bivariate disproportionality analysis, the frequentist methods generally have the following advantages and limitations compared with Bayesian methods. Several comparative studies of detection trends of these detection approaches have been reported (van, Puijenbroek et al., 2002;Kubota et al., 2004;Li et al., 2008;Bonneterre et al., 2012;Ang et al., 2016;Pham et al., 2019).
The advantages of the frequentist methods are generally as follows: 1. early signal detection, 2. sensitive, 3. easily applicable, and 4. easy to understand. While the limitations are 1. detection of false positive signals and 2. low specificity.
Although these advantages and limitations are considered to show a similar tendency in the signal detection models of DDIs, at this stage, the verification is not sufficient. Furthermore, the statistical models introduced in Statistical Methodology are not sufficiently clarified the difference in detection power. Therefore, in the future, it is necessary to examine the similarity and specificity of the signal detection tendency of each statistical model introduced.
As mentioned in Limitation, there are various biases (Weber, 1984;Hartnell and Wilson, 2004;Pariente et al., 2007;Hochberg et al., 2009;Wang et al., 2010;Matsuda et al., 2015) as these signals are calculated using the spontaneous reporting system. So the signal obtained is only a hypothesis. This does not change whether it is signals of single drug or DDIs. Therefore, considerable attention must also be paid to the interpretation of results in signal research of DDIs.
As indicated so far, most studies have focused on the analysis of AEs caused by the concomitant use of two drugs. However, in polypharmacy patients, the occurrence of AEs by interaction of multiple drugs (e.g., fourth order drug interaction: drug D 1drug D 2 -drug D 3 -AE) is a concern. Therefore, in the future, establishment of a signal detection method for this higher order drug interaction will be more important.
This review has introduced only statistical methodologies for detecting DDIs based on the number of AEs reported.
In recent years, the method for detecting the signals that use time-to-onset instead of the number of reports have been studied (van Holle et al., 2012;van Holle et al., 2014;Scholl and Van Puijenbroek, 2016), but there are no examples of using them for DDIs. Since it may be possible to detect the signals that cannot be obtained with the statistical models introduced in this review, further studies are expected.