Individual differences in the effects of midazolam on anxiety-like behavior, learning, reward, and choice behavior in male mice

Introduction The aim of the present study was to investigate the behavioral effects of the benzodiazepine midazolam in male mice, in models of anxiolysis, learning, and abuse-related effects. Methods In a first set of experiments, male Swiss mice were submitted to the training session of a discriminative avoidance (DA) task on the elevated plus maze to evaluate anxiety-like behavior and learning after vehicle or midazolam (1, 2 or 5 mg/kg, i.g.) administration. The same animals were submitted to a conditioned place preference (CPP) protocol with midazolam (1, 2 or 5 mg/kg, i.g.). In a second experiment, outbred (Swiss) and inbred (C57BL/6) male mice were submitted to a two-bottle choice (TBC) oral midazolam drinking procedure. Animals were exposed to one sucrose bottle and one midazolam (0.008, 0.016 or 0.032 mg/ml) plus sucrose bottle. Results Midazolam (1 and 2 mg/kg) induced anxiolytic-like effects, and all doses of midazolam prevented animals from learning to avoid the aversive closed arm during the DA training session. Assessment of midazolam reward via the CPP procedure and choice via the TBC procedure showed notable variability. A 2-step cluster analysis for the CPP data showed that midazolam data were well-fitted to 2 separate clusters (preference vs. aversion), albeit with the majority of mice showing preference (75%). Correlational and regression analyses showed no relationship between midazolam reward and anxiolytic-like effects (time spent in the open arms in the DA test) or learning/memory. Two-step cluster analysis of the TBC data also demonstrated that, regardless of strain, mice overall fell into two clusters identified as midazolam-preferring or midazolam-avoiding groups. Both midazolam preference and avoidance were concentration-dependent in a subset of mice. Discussion Our findings show that midazolam preference is a multifactorial behavior, and is not dependent solely on the emergence of therapeutic (anxiolytic-like) effects, learning impairments, or on genetic factors (inbred vs. outbred animals).


. Introduction
Benzodiazepines are among the most widely prescribed psychiatric medications, with more than 8% of the adult U.S. population reporting benzodiazepine use (1). This widespread use is partially driven by benzodiazepine prescriptions for one of their many therapeutic uses, predominantly anxiety and sleep disorders. However, benzodiazepine misuse also has increased in recent years, with nearly 20% of individuals who use benzodiazepines reporting misuse in the U.S. (2,3). This has prompted a growing public health concern, particularly due to increasing rates of benzodiazepinerelated overdose deaths and emergency department visits in recent years (4).
Decades of research have helped elucidate the pharmacological mechanisms underlying the behavioral effects of benzodiazepines (5). However, many questions still remain unanswered regarding their abuse-related behavioral effects, particularly due to inconsistencies in the literature. The positive reinforcing effects of benzodiazepines have been shown more consistently in studies using intravenous drug self-administration (5,6). On the other hand, many pre-clinical self-administration studies using the oral route showed no or low reinforcing effects of benzodiazepines (7-9). Similarly, benzodiazepines have been shown to exert rewarding effects in the conditioned place preference (CPP) model in some studies (10)(11)(12), but not others (13,14). This discrepancy is particularly relevant given that studies show that benzodiazepines can increase (15, 16), decrease (17, 18), or even not alter (19, 20) extracellular dopamine brain levels, depending on the study or protocol.
In addition to these inconsistencies, the relationship between the anxiety-decreasing (anxiolytic) effects and the abuse potential of benzodiazepines remains poorly understood. Anxiety is a risk factor for sedative, hypnotic, or anxiolytic use disorder (21), and studies have shown that greater anxiety sensitivity is associated with increased rates of non-medical benzodiazepine use (22,23). Epidemiological studies also show that anxiety is associated with higher rates of benzodiazepine misuse and use disorder [for review see (4)]. However, whether experiencing an anxiolytic effect is associated with and/or necessary for the emergence of the positive reinforcing effects of benzodiazepines remains unknown.
The aim of the present study was to investigate the behavioral effects of the benzodiazepine midazolam in male mice, with a focus on its rewarding effects and self-administration. The rewarding effects of midazolam were evaluated using CPP, and were compared with the anxiolytic effects of this drug using an elevated plus maze-discriminative avoidance task (24). Selfadministration of midazolam was evaluated using a two-bottle choice (TBC) model. Inbred and outbred mice were used in the TBC model to investigate potential broad genetic determinants of midazolam preference/avoidance.

. . Animals
Three-month-old Swiss male mice from the breeding colony of Universidade Estadual de Santa Cruz (UESC) and 3-month-old C57BL/6 male mice obtained from Harlan/Envigo were used. The first set of experiments (Swiss mice; discriminative avoidance task and CPP) was performed at UESC, and animals were group housed (8 per cage) in polypropylene cages (41 × 34 × 16.5 cm). Rodent chow (Nuvilab, Quimtia SA, Colombo, PR, Brazil) and water were available ad libitum throughout the experiments. The TBC experiments were performed at UESC (Swiss mice) and at the University of Mississippi Medical Center (UMMC, C57BL/6 mice), and subjects were individually housed in polypropylene caging (30 × 19 × 13 cm) with food available ad libitum and fluids restricted to those available within the context of the experiment 24 h per day. All animals were maintained under controlled temperature (22-23 • C) and light ( . . Elevated plus maze-discriminative avoidance task The elevated plus maze-discriminative avoidance task model was developed to allow for the investigation of several behaviors and behavioral effects of drugs in the same model. This model allows for the simultaneous investigation of learning/memory, anxiety-like behavior and locomotor activity (24)(25)(26). The elevated plus maze-discriminative avoidance task has been validated with the use of several different drug classes, including anxiolytic drugs such as benzodiazepines (24,(27)(28)(29) and ethanol (30,31), stimulants (24,26,(32)(33)(34) and opioids (35).
The behavioral sessions are performed in a modified elevated plus maze, in which the animal explores two adjacent closed arms (28.5 x 7 x 18.5 cm), one of which is aversive (aversive stimuli: 100 watts light and 80 dB noise when the animal enters the arm), and two adjacent open arms (28.5 x 7 cm) lacking the light and noise stimuli. For this experiment, a test session is performed 24 h after a training session. During the 10-min training session, animals are placed individually in the center of the apparatus with free access to the 4 arms. The aversive stimuli are activated each time the animal entered the aversive closed arm, and were interrupted when the animal leaves this compartment. The test session lasted

. . Conditioned place preference
The CPP apparatus consisted of two conditioning compartments of equal size (40 × 20 × 20 cm): compartment A, with black and white vertical lines on the walls and a black wooden floor, and compartment B, with black and white horizontal lines on the walls and a dark (red) smooth floor. The two main compartments were connected by a central compartment (40 × 10 × 15 cm) that was accessible by sliding doors. Test sessions were filmed, and the time spent in each compartment was measured using the ANY-maze R software (version 5.1, Stoelting). The CPP procedure consisted of the following phases: Habituation (Day 1): Animals were placed in the center of the apparatus with free access to all compartments for 10 min. No treatments were administered. Pre-conditioning test (Day 2): Animals were placed in the center of the apparatus with free access to all compartments, and behavior was recorded for 15 min. No treatments were administered. Conditioning (Days 3-14): An unbiased design was used because animals showed no preference for either of the compartments in the pre-conditioning test. Therefore, animals were randomly assigned to an experimental group and to a "midazolam-paired compartment" in a counterbalanced manner. The conditioning sessions were performed during 12 consecutive days, during which the doors remained closed and animals were confined to one of the conditioning compartments. On odd days, animals received an intragastric administration of midazolam. On even days, animals received an intragastric administration of saline. Ten minutes after midazolam or saline administrations, animals were confined to the assigned drug-or saline-paired compartment for 10 min.
Post-conditioning test (Day 15): Animals were placed in the center of the apparatus with free access to all compartments, and behavior was recorded for 15 min. No treatments were administered.

. . Two-bottle choice
Subjects were initially habituated to two 15 ml bottles of water for 3 days, followed by habituation to two 15 ml bottles containing 4% sucrose for another 3 days. Following this initial habituation phase, subjects had 24-h access to two 15 ml drinking bottles in their individual home cages, one containing 4% sucrose, and the other containing 4% sucrose plus midazolam (0.008, 0.016 or 0.032 mg/ml). All subjects were exposed to each concentration of midazolam for 14 days, and bottle sides were switched every 7 days. Consumption from each bottle was measured once every 24 h, at which time all subjects were weighed and bottles were refilled. In order to ensure data were not affected by liquid loss due to bottle leaks, for each cohort (Swiss vs. C57BL/6 cohorts) two bottles were left in an empty cage for 1 week, during which time liquid loss was measured and found to be <0.1 ml/day.
. . Experimental design . . . Experiment . Evaluation of the anxiolytic-like, cognitive and rewarding e ects of the benzodiazepine midazolam Forty-eight Swiss male mice were randomly distributed into four groups and submitted to the elevated plus maze-discriminative avoidance task procedure, as described previously. On the training day, animals received an intragastric administration (gavage) of vehicle solution (n = 18) or midazolam at doses of 1 (MDZ 1, n = 12), 2 (MDZ 2, n = 18) or 5 (MDZ 5, n = 12) mg/kg. Ten minutes after administration, animals were placed individually in the center of the apparatus and had free access to all arms of the apparatus for 10 min. Twenty-four hours after the training session, all animals were submitted to a 3-min drug-free test session.
One week after the test day, the animals treated with midazolam during the discriminative avoidance task experiment were submitted to the CPP protocol, as described previously. Animals were maintained in the same midazolam groups, receiving the same dose of midazolam during the discriminative avoidance and the CPP protocols. Animals were submitted to the habituation and pre-conditioning test sessions. During the conditioning phase, animals received intragastric administration (gavage) of midazolam (1, 2 or 5 mg/kg, groups MDZ1, MDZ 2 and MDZ 5, respectively; n = 12 per group) on odd days and were confined to the drug-paired compartment for 10 minutes. On even days, all animals received intragastric administration (gavage) of saline and were confined to the opposite compartment for 10 min.
Twenty-four hours after the last day of the conditioning protocol, the post-conditioning test was performed, and the time spent in each of the main compartments was recorded. Expression of drug-induced CPP or conditioned place aversion was determined using the "score" measure (time spent in the drug-paired compartment minus time spent in the salinepaired compartment). A longer time spent in the compartment associated with the drug compared to the compartment paired with saline (positive score) was considered as indicative of the development of midazolam-induced CPP, while a negative score  . . . Experiment . Evaluation of midazolam choice behavior Twenty-nine Swiss (outbred) male mice and 43 C57BL/6 (inbred) male mice were submitted to the habituation and TBC protocols, as previously described. Consumption of sucrose and midazolam plus sucrose solutions were averaged for the last 3 days of self-administration of each midazolam concentration (0.008, 0.016 or 0.032 mg/ml). Preference or aversion for the midazolam bottle over the sucrose only bottle was assessed for each midazolam concentration by calculating the consumption of the midazolam bottle / total consumption of the two bottles * 100.

. . Statistical analyses
The behavioral data from each experiment (Discriminative avoidance: % time spent on each arm of the device or immobility time; CPP: score; TBC: % preference) were analyzed using one-or two-way analysis of variance (ANOVA), with or without repeated measures (specific analyses described in the results section for each experiment). For all analyses, Bonferroni t-tests were used as the post-hoc test. In addition to the dependent measures listed above, three derived scores were calculated, including: (1) CPP score = time spent in the drug-paired compartment-time spent in the non-drug-paired compartment during the post-conditioning test session; (2) Learning score = time spent in the non-aversive closed arm-time spent in the aversive closed arm during the EPM training session; and (3) Memory score = time spent in non-aversive closed arm-time spent in the aversive closed arm during the EPM test session (24 h after training). The behavioral data from each experiment were analyzed using one-or twoway analysis of variance (ANOVA), with or without repeated measures (specific analyses indicated in the results section for each experiment). For all analyses, Bonferroni t-tests were used for multiple comparison tests. These analyses, as well as all graphical representations, were performed using the GraphPad Prism software (version 9).
Initial analyses of the data for the CPP studies revealed considerable variance for the CPP score, with distributions of scores predominantly positive (i.e., above zero, or no preference) but with negative scores (i.e., aversion) of relatively high magnitude. We proposed that mice showed either a significant preference or aversion to the midazolam-paired chamber, which represent diametrically opposed predictions for a drug reported consistently to have rewarding effects. Similarly, the TBC studies showed considerable variance for the percentage of preference for the midazolam-containing bottle, leading to a related prediction that mice either preferred or avoided consumption of midazolam. To test these possibilities, we conducted 2-step cluster analysis using CPP score and TBC preference measures. Two-step cluster analysis is a hybrid approach that first calculates a distance measure (centroids) to separate groups, followed by a probabilistic approach to choose an optimal subgroup (36,37). Distance measures were determined by the log-likelihood criterion and cluster numbers were determined by the Schwarz Bayesian Criterion, with a default of 15 clusters iterations total.
Cohesion and separation of clusters was evaluated using the silhouette coefficient. Internal validity was evaluated further by comparing clusters using unpaired t-tests, Fisher's exact tests (categorical data), ANOVA and planned Bonferroni t-tests (to test for dose-associated effects), as well as conducting repeated clustering (n = 15) with newly randomized order of data for each analysis (cluster results can depend on order of data entered). External validation presented more difficulties, because of the lack of available data and analytic approaches providing construct validity associating either concurrent or mechanistic measures of midazolam reward. This study tested two hypotheses that addressed external validation: Midazolam reward is mediated by (1) reduction of anxiety (anxiolysis) and (2) associative learning and memory processes. To assess these hypotheses, correlation (Pearson r) and regression analysis were performed with CPP Score as a predictor of time in open arm (anxiolysis), learning and memory score. In addition, concepts of preference and aversion in CPP procedures are conceptually related to preference and avoidance in the TBC procedure, although as with our hypotheses, there are no available data to address these comparisons directly. Regardless, a general concordance between CPP and TBC with regards to number of clusters (i.e., preference vs. aversion/avoidance) would provide external validation. Cluster analyses were performed using IBM SPSS Statistics software (version 28). For all analyses, family-wise error rate (alpha) was constrained to p ≤ 0.05.  < 0.001), demonstrating that animals learned to avoid the aversive closed arm. No significant differences were observed in the % time spent in the closed arms for the groups treated with the lowest doses of midazolam (1 and 2 mg/kg), and animals treated with the highest dose of midazolam (5 mg/kg) spent a significantly lower % time in the non-aversive closed arm compared to the aversive closed arm (p < 0.01), suggesting that all doses of midazolam impaired learning. In agreement, animals treated with 2 and 5 mg/kg midazolam were also significantly different from the vehicle group for both % time spent in the closed aversive arm (p < 0.001 and p < 0.0001, respectively) and % time spent in the non-aversive closed arm (p < 0.001 and p < 0.0001, respectively). Results from the memory measures during the test session of the discriminative avoidance experiment are illustrated in Figure 2. For the analysis of the % time spent in the closed arms, twoway repeated measures ANOVA showed a significant interaction between compartment (aversive closed arm vs. non-aversive closed arm) and treatment (vehicle vs. midazolam) [F (3, 55) = 3.968; p < 0.05]. Only the vehicle group spent a significantly greater % time in the non-aversive closed arm compared to the aversive closed arm (p < 0.01), indicating that animals learned the association between aversive and non-aversive arms during the training session. No significant differences were observed between the % time spent in the aversive vs. non-aversive closed arms for midazolam-treated animals. Animals treated with midazolam spent a significantly higher % time in the aversive closed arm (p < 0.05 for 2 mg/kg) and a significantly lower % time in the non-aversive closed arm (p < 0.05 for 2 and 5 mg/kg) compared to the vehicle group.
. . . Conditioned place preference CPP results, measured as the difference in time spent in the midazolam-and saline-paired side (CPP score), are shown as a function of pre-conditioning and post-conditioning in Figure 3. During pre-conditioning, CPP scores for individual mice tended to aggregate near zero, with variability contributed by 2-4 mice at each dose condition. However, a more distributed set of CPP scores at each dose was observed in the post-conditioning tests. Twoway repeated measures ANOVA showed no significant differences of dose or conditioning phase [e.g., dose x conditioning phase interaction: F (2, 33) = 0.593, p = 0.558].
Because the CPP score represents a dichotomous variable, with positive numbers indicating preference and negative numbers indicating aversion, we explored the possibility of mice falling into distinct categories. Two-step clustering analysis was conducted separately for each dose of midazolam (1.0, 2.0, 5.0 mg/kg). Schwarz Bayesian Criterion (BIC) reached acceptable clustering with two centroids for all three doses. Figure 4 shows the results of this cluster analysis, with cluster 1 = aversion, i.e., negative numbers, and cluster 2 = preference, i.e., positive numbers. With some .

FIGURE
Results from the test session of the elevated plus maze-discriminative avoidance task -h after the training session, during which animals received i.p. administration of vehicle (n = ) or midazolam at the doses of (MDZ , n = ), (MDZ , n = ) or (MDZ , n = ) mg/kg. No drugs were administered before the test session. Time spent in the aversive vs. non-aversive closed arms (memory retrieval). Data are shown as mean±SEM. *p < . compared to Vehicle within the same parameter; •p < . compared to time spent in the aversive closed arm within the same group. exceptions at 1.0 and 5.0 mg/kg, all CPP scores fell above or below zero, depending on the cluster. Based on the analysis, 66.7-75% of mice were grouped into the preference category (Figure 4, top panels), with frequency distributions showing the majority of subjects with CPP scores above zero and low degrees of overlap (Figure 4, middle panels). Silhouette analysis of cluster cohesion and separation showed index scores of 0.79 to 0.82, indicating "good" cluster quality (Figure 4, bottom panels). Repeated analyses with randomized data sets did not alter results. Importantly, we performed the same 2-step cluster analysis for the pre-conditioning data, based on the assumption that no clustering would be possible prior to any drug conditioning. Single clusters were obtained for the pre-conditioning phase for 2.0 and 5.0 mg/kg groups, and while two clusters were obtained for the 1.0 mg/kg group, the iterative process identified two outliers (CPP scores of −293 and −392) and resulted in a silhouette score categorized as "poor." Interestingly, the two mice with the outlier scores remained at negative numbers for the post-conditioning test, indicating that they fell into the "aversion" cluster. Because these two subjects did not change to the "preference" cluster (which would result in a substantial change in CPP scores) and this pattern was not evident at the other two doses, we did not exclude the mice from any of the analyses.
We performed additional internal validation tests that also provided information on dose-dependency ( Figure 5). For these analyses, the clusters were analyzed with separate repeated measures ANOVAs. For cluster 1 (aversion; Figure 5  Based on the cluster analysis, the majority of mice showed preference for the midazolam-paired compartment; an effect off-set by mice showing aversion. It was notable that there was variance in the pre-conditioning phase, with one group even displaying 2 clusters, raising the likelihood that the mice demonstrated preference or aversion in the absence of drug conditioning. To evaluate the nature of change in CPP score from pre-to postconditioning, we first coded mice with three numbers according to the following categories: −1.0, mice showing positive scores in pre-conditioning and negative in post-conditioning; 0, mice that stayed either positive or negative in pre-conditioning and post-conditioning; +1.0, mice showing negative scores in preconditioning and positive scores in post-conditioning. A frequency histogram was plotted (Figure 6), showing that for each dose, the majority of mice did not change from pre-to post-conditioning, i.e., if they showed a negative pre-conditioning score, they showed a negative post-conditioning score. The next highest frequency was the mice showing a change from negative to positive CPP scores after conditioning, with a smaller number of mice (25% for all three doses) demonstrating a shift from positive to negative CPP scores.
. . . Comparison of parameters from conditioned place preference and elevated plus maze-discriminated avoidance tasks In order to obtain information regarding potential behavioral mechanisms underlying midazolam-induced conditioned place preference, we conducted additional analyses to determine the extent to which CPP score in mice showing place preference could .
/fpsyt. .  Table 1 for the 4 measures, conducted within each dose of midazolam. As evident from the table, no correlations were significant with CPP score, although significant positive correlations were obtained for time in open arms for memory score at 1.0 mg/kg midazolam and learning score at 5.0 mg/kg of midazolam.
To test specifically if CPP score was a reliable predictor of EPM-DA parameters, individual linear regression analyses were conducted for the cluster 2 (preference) mice (Table 2). In every case, the regression parameter values were not significantly different from zero, with relatively low goodness-of-fit values (R2). Therefore, no evidence to support the hypotheses that CPP reflects anxiolytic or learning and memory associated processes were obtained for this data set.

. . Experiment . Evaluation of midazolam choice behavior
Results from TBC tests with both Swiss and C57BL/6 mice cohorts are shown in Figure 7. For Swiss mice (Figure 7, left panel), ANOVA revealed a significant effect of concentration [F (2,56) = 4.383, p = 0.030]; however, no multiple comparisons were significant (p's > 0.05, Bonferroni t-tests). For C57BL/6 mice, the overall ANOVA was not significant [F (2,84) = 0.099, p = 0.899]. However, as with CPP in Swiss mice, midazolam preference was highly variable, with mice showing both preferences (above 50%) and avoidance (below 50%). We used 2-step cluster analyses to evaluate the extent to which mice in these studies could be parsed into those that preferred midazolam above 50% levels and those that avoided consuming the drug. For Swiss mice, all three concentrations resulted in 2 clusters (Figure 8). In general, the distribution of mice into the two clusters was approximately equal, with very little overlap among clusters (Figure 8, top and middle panels). The differences in percent midazolam preference between clusters was confirmed by unpaired t-tests [0.008 mg/ml: t (27) = 14.01, p < 0.0001; 0.016 mg/ml: t (27) = 8.596, p < 0.0001; t (27) = 7.556, p < 0.0001]. For all three doses, the silhouette scores were in the "good" range (i.e., >0.5; Figure 8, bottom panels).
A noteworthy characteristic of the TBC data is the observation that different mice were in different clusters across the concentrations. This is expected, because many mice showed mixtures of preference and avoidance depending on the concentration, as is a fundamental characteristic of selfadministration data over all drug classes and most procedures. To quantify this phenomenon, we coded each mouse as "same" of "mixed" effects. "Same" indicated a mouse for which all three concentrations were either above 50% preference or ≤50% preference. "Mixed" indicated a mouse for which at least one concentration differed from the other concentrations. For example, a mouse with preference above 50% preference for 0.016 mg/ml but below 50% preference for the other two concentrations was coded as "mixed." For the two strains, we compared the two clusters by conducted Fisher's exact tests. As shown in Figure 10, for Swiss mice, both same and mixed categories were observed about equally and did not differ between the clusters (Fisher's exact test, p > 0.05). Interestingly, for C57BL/6 mice, cluster 1 (preference) mice were predominantly in the mixed category, whereas cluster 2 (avoidance) mice were infrequently coded as mixed (Fisher's exact test, p = 0.0002). This analysis indicates that for Swiss TBC results, effects dependent on dose accounted for half of the subjects in both clusters, whereas with C57BL/6 mice, effects were dependent on dose predominantly for the mice showing preference for the midazolam solutions.

. Discussion
Despite the clear and growing concern over benzodiazepine misuse, investigating the abuse-related effects of these drugs has not been as straightforward as researchers might expect. The pre-clinical literature on the effects of benzodiazepines in animal models has been filled with contradictory findings, and establishing models to investigate benzodiazepine . /fpsyt. .    reward and reinforcement has been a challenge. Specifically, studies have shown opposite effects for benzodiazepine selfadministration (5-9), benzodiazepine-induced CPP (10)(11)(12)(13)(14) and changes in brain dopamine levels induced by benzodiazepines (15-20).
In the present study, assessment of midazolam reward via the CPP procedure and choice via the TBC procedure showed notable variability, with evidence that mice developed CPPs or conditioned place aversions (CPAs) with midazolam exposure and, similarly, preferred or avoided midazolam in the TBC model. We evaluated the extent to which mice could be divided into broadly different categories, i.e., midazolam-preferring vs. midazolam non-preferring, using 2-step cluster analysis. This approach was used because it does not require a priori choice of number of . /fpsyt. .    (indicating a mouse for which all three concentrations were either above % preference or ≤ % preference) or "mixed" (indicating a mouse for which at least one concentration di ered from the other concentrations). Data are separated by cluster (preference) and cluster (avoidance) for the two strains (Swiss and C BL/ male mice). Note that #p < . , Fisher's exact test.
in the preference cluster, whereas the aversion cluster generally showed robust aversions with no dose-related effects. The majority of mice demonstrated CPP by an increase in time spent in the midazolam-paired compartment, mostly by increasing time spent in the particular chamber vs. shifting preference from one chamber to another. This latter observation suggests that mice already showing an aversion to a drug-paired chamber may not be likely to change to a preference, however, the mice in the aversion cluster mostly showed increased time in the non-drug-paired side instead of no change from pre-conditioning tests.
The distinct clusters observed for the CPP experiment allowed us to assess the relationship between reward and other characteristic effects of benzodiazepines. In this regard, midazolam had anxiolytic-like effects in mice, increasing the time spent in the open arms of the modified EPM apparatus, consistent with previous studies (38,39). To investigate the relationship between the anxiolytic-like and rewarding effects of midazolam, we conducted correlational analysis as well as regressed CPP scores vs. time spent in the open arms of the EPM. These analyses showed no relationship between midazolam reward and this measure of anxiolytic-like effects, suggesting that the emergence of anxiolytic-like effects is not sufficient to guarantee the expression of rewarding effects. Interestingly, strong positive correlations were shown for learning and memory scores vs. time in the open arms, suggesting that a stronger anxiolytic-like effect was associated with a higher degree of learning and memory impairment. In fact, midazolam reducing the aversiveness of the open arm may play a key role in any learning/memory impairment associated with this particular task.
The finding that midazolam impaired learning and memory of a discriminative avoidance task is consistent with previous pre-clinical studies with midazolam (28) and other benzodiazepine-type drugs (24,29,40). Because the CPP model relies on associative learning, we also investigated .
/fpsyt. . a potential correlation between the rewarding (CPP) and cognitive-impairing (discriminative avoidance task) effects of midazolam. As with anxiolytic-like measures, we found no significant relationships between these two measures, suggesting that the rewarding and aversive effects of midazolam emerged despite significant learning deficits induced by this drug. In addition to testing the hypotheses that midazolam reward is associated with its anxiolytic and cognitive-impairing effects, these comparisons potentially provided tests of external validity for the 2-step cluster approach. Clearly these findings did not provide external support for the clustering, with lack of a relationship between CPP and cognitive effects perhaps the most perplexing. However, it is critical to note that the effects of midazolam in the learning and memory components of the discriminative avoidance task were to impair these processes, whereas CPP and CPA involve forming associative pairings. Moreover, learning to avoid an open arm may represent a form of fear conditioning, as opposed to reward learning represented by CPP, which was the result of 75% of the mice, and while neural circuits mediating aversive and reward learning may overlap, there likely are distinct functional differences [e.g., (41)]. Regarding anxiolysis, the hypothesis that the expression of reward may reflect reductions in anxiety is based primary on self-report data from human subjects identifying motives for taking benzodiazepines [e.g., (4)], rather than data from laboratory animal studies. Collectively, these observations do not provide external validity for the clustering but also are insufficient to discount the clustering approach, given that anxiolysis and learning/memory were components of hypothesis testing and not empirical conclusions per se.
External validation of mice being categorized as midazolampreferring vs. midazolam-averse comes primarily from the TBC experiments. Two-step cluster analysis demonstrated that two different strains of mice overall fell into two clusters identified as midazolam-preferring or midazolam-avoiding groups, with only one exception being the lowest concentration of midazolam tested in C57BL/6 mice, which resulted in an additional (third) cluster characterized as indifference (i.e., equal distribution of drinking from midazolam + sucrose and sucrose alone bottles). Both midazolam preference and avoidance were concentrationdependent in a subset of mice, with some showing preference at some concentrations but avoidance at others. However, there was a trend for this pattern to occur more frequently in the midazolampreferring Swiss mice and a statistically significant difference between midazolam-preferring and midazolam-avoiding C57BL/6 mice, suggesting that mice in the midazolam-avoiding groups tended to only show avoidance regardless of the concentration of drug.
Wild type laboratory mice can be divided into two main genetic categories: inbred and outbred (42). Inbred mice, such as the C57BL/6 mouse strain, are genetically homogeneous, and there is little genetic variation within this strain, which can reduce experimental variability and allow for the evaluation of genetic influences on specific behavioral phenomena. Outbred mice, such as the Swiss mouse strain, are bred specifically to maximize genetic diversity and heterozygosity within a population and, in theory, there are no two genetically identical outbred subjects. Therefore, the use of genetically heterogeneous and homogeneous strains allowed us to assess whether genetic factors could influence the expression of midazolam preference vs. avoidance. Our findings showed that both inbred and outbred mice demonstrated a strikingly similar pattern of preference and avoidance in the TBC experiments, even with the two TBC studies conducted at separate facilities. These studies ruled out a potential influence of genetic factors in our findings, raising the possibility that midazolam preference vs. avoidance groupings may develop in mice due to epigenetic factors.
The present findings corroborate a recent study in non-human primates showing that only half of the subjects self-administered the benzodiazepine alprazolam intravenously, although that study was conducted in rhesus monkeys with a history of opioid self-administration (43). These findings are also in agreement with a choice study in humans showing that, while diazepam was always preferred over placebo, placebo was preferred over oxazepam in nearly 22% of choice tests by recreational benzodiazepine users (44). The mechanisms underlying these contrasting findings within a study remain unknown. However, the unique pharmacokinetic properties of midazolam and other benzodiazepines may have contributed to these results. Due to its pharmacokinetic and pharmacodynamic properties, midazolam induces hysteresis, which results in a delay between the peak drug serum concentrations and the peak drug behavioral effects (45). Hysteresis indicates that the relationship between drug concentration vs. drug effects is not a straightforward, direct relationship, but may have an inherent delay and imbalance, which may be a result of active metabolites, or a consequence of changes in pharmacodynamic properties (45). Importantly, studies have shown that hysteresis influences benzodiazepine self-administration in rats (46). Of note, hysteresis also has been reported for both alprazolam (47) and oxazepam (48). Although further studies are needed to understand how this specific effect could affect some animals but not others, these pharmacokinetic and pharmacodynamic mechanisms may have influenced our findings.
Overall, our findings show that midazolam preference is a multifactorial behavior, and is not dependent solely on the emergence of anxiolytic-like effects, or on genetic factors (inbred vs. outbred animals). Also, the rewarding effects of midazolam in the CPP model emerged even at doses that induced significant learning deficits in mice. The protocols established in the present study can be used in future research to evaluate the neuropharmacological mechanisms involved in the different behavioral effects of benzodiazepine drugs within the same group of animals. Of note, important limitations of our study include the lack of sex differences investigation, with the possibility that different results would have been obtained for female mice. Also, the sample size in our CPP studies limited some of our analyses, and future studies should consider including multiple cohorts of animals to increase sample size in order to better capture benzodiazepine-induced CPP vs. CPA in mice. Regardless, our data emphasize the importance of considering interindividual variability within a sample, and suggest that variability may be an inherent phenomenon to the study of the abuse-related behavioral effects of benzodiazepines. Embracing variability may provide new avenues of study and a better understanding on how and why benzodiazepine drugs are abused.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
This animal study was reviewed and approved by the Institutional Animal Care and Use Committees of UESC (protocol #006/2017) and UMMC (protocol #1395).