Automated Assessment of Pavlovian Conditioned Freezing and Shock Reactivity in Mice Using the Video Freeze System

The Pavlovian conditioned freezing paradigm has become a prominent mouse and rat model of learning and memory, as well as of pathological fear. Due to its efficiency, reproducibility and well-defined neurobiology, the paradigm has become widely adopted in large-scale genetic and pharmacological screens. However, one major shortcoming of the use of freezing behavior has been that it has required the use of tedious hand scoring, or a variety of proprietary automated methods that are often poorly validated or difficult to obtain and implement. Here we report an extensive validation of the Video Freeze system in mice, a “turn-key” all-inclusive system for fear conditioning in small animals. Using digital video and near-infrared lighting, the system achieved outstanding performance in scoring both freezing and movement. Given the large-scale adoption of the conditioned freezing paradigm, we encourage similar validation of other automated systems for scoring freezing, or other behaviors.

Fear is often measured as freezing (defined as the suppression of all movement except that required for respiration) (Curti, 1935(Curti, , 1942Grossen and Kelley, 1972;Fanselow and Bolles, 1979;Fanselow, 1984), a prominent species-specific defense reaction in both rats and mice, with a long history of study (discussed below) (Bolles, 1970).
Contextual fear has garnered a very high level of interest because it is dependent on the hippocampus and as such, has become a leading model of declarative memory. As with human declarative memory, hippocampal lesions produce a time-limited and selective deficit of contextual fear, such that lesions made 1 day after training produce a severe retrograde amnesia of contextual fear, but those made one month or more after training produce little or no deficit (Kim and Fanselow, 1992;Anagnostaras et al., 1999b). Cued memory usually does not depend on the hippocampus, but can, especially as in trace conditioning, or when the ventral hippocampus is included (Maren, 1999;Maren and Holt, 2004;Quinn et al., 2008;Esclassan et al., 2009). In contrast, both contextual and tone fear depend on the amygdala for the lifetime of the animal (Gale et al., 2004;Poulos et al., 2009). Pavlovian fear conditioning is an effective assay for memory enhancements and deficits because: (1) The task is robust in rats and mice, with conditioning occurring even after a single trial (Anagnostaras et al., 2000).
(2) The assay is not labor intensive; for example, a training session lasts 3-10 min, with 1-15 tone-shock pairings, as compared to several training days for most other forms of hippocampus-dependent memory.
(3) The equipment is readily available and compact, allowing many animals to be tested at once (up to 16 in a small officesized room). "The frightened man at first stands like a statue motionless and breathless, or crouches down as if instinctively to escape observation." -Charles Darwin, The Expression of the Emotions in Man and Animals, 1872 (pp. 290-291).

IntroductIon
The rapid growth of large scale genetic and pharmacological screening approaches has necessitated the development of efficient, accurate, automated phenotyping tools in rats and mice, including tools for the assessment of behavior, and assays of cognitive function (Clark et al., 2004;Crabbe and Morris, 2004;Tecott and Nestler, 2004;Reijmers et al., 2006;Matynia et al., 2008;Gale et al., 2009;Page et al., 2009). Research areas generating a high level of interest include the cognitive effects of genetic and pharmacological manipulation, especially effects on memory. In terms of drug development, assays of cognitive function serve a dual role: (1) providing characterization of important pharmacological targets (e.g., memory) for diseases such as Alzheimer's or attention-deficit hyperactivity disorder, and (2) detection of unwanted cognitive side effects by using the assay as a toxicological screen (Takahashi, 2004;Shuman et al., 2009;Wood and Anagnostaras, 2009). In recent years, Pavlovian fear conditioning has become a leading model by which to study memory (Anagnostaras et al., 2000;Maren, 2008). Indeed, to a large degree the paradigm has displaced other tasks because of its efficiency and reproducibility. In Pavlovian fear conditioning, an initially neutral conditional stimulus (CS; such as a tone) is paired with a fear-inducing, aversive unconditional stimulus (US; usually a footshock) in a novel chamber. After pairing, the animal develops a long-lasting fear of the discrete tone CS, known as tone or cued fear, as well as a fear of the environmental chamber, which has come to be known as contextual fear.
Automated assessment of Pavlovian conditioned freezing and shock reactivity in mice using the VideoFreeze system (4) The behavioral psychology of fear conditioning is well understood, and the stimuli well controlled, as it is one of the most studied forms of learning. Fear conditioning experiments form the cornerstones of contemporary theories of learning (e.g., Rescorla, 1968), and contextual learning has also been theoretically explored (Nadel and Willner, 1980). (5) The assay is a model for both learning and memory, as well as pathological fear such as in the anxiety disorders (Anagnostaras et al., 1999a;Fendt and Fanselow, 1999). (6) The learning episode is punctate, and the memory is very long-lasting, allowing for the study of memory phases with high temporal resolution (Bourtchuladze et al., 1994;Kida et al., 2002;Frankland et al., 2006;Reijmers et al., 2007;Matynia et al., 2008;Cai et al., 2009a,b). (7) Some form of fear conditioning is shown by all animals in which it has been attempted, including Drosophila (Davis, 2005), C. elegans (Wen et al., 1997), and Aplysia (Walters et al., 1981), allowing for the possibility that some mechanisms are conserved across the entire animal kingdom (Davis, 2005;LeDoux, 2002). (8) The neuroanatomy and molecular biology of fear conditioning have been extensively studied, offering a rich basis for future experiments (Selden et al., 1991;Phillips and LeDoux, 1992;Bourtchuladze et al., 1994;Kogan et al., 1997;Anagnostaras et al., 1999c;Kida et al., 2002;Miller et al., 2002;Frankland et al., 2004Frankland et al., , 2006Davis, 2005;Reijmers et al., 2006Reijmers et al., , 2007Maren, 2008;Ehninger and Silva, 2009).
One aspect of the fear conditioning paradigm that may be improved is the measurement of freezing behavior. Historically, freezing has been scored by human observers, or by a variety of proprietary and often unvalidated automated systems. For manual scoring, rodent freezing is measured as percent time freezing for a given test period, which can be measured continuously with a stopwatch, or by instantaneous time sampling every 3-10 s (e.g., Bolles and Riley, 1973;Fanselow and Bolles, 1979;Bouton and Bolles, 1980;Collier and Bolles, 1980;Fanselow, 1980;Sigmundi et al.,1980;Phillips and LeDoux, 1992).
Although manual scoring has proven highly reliable, it is time consuming and tedious for experimenters and could lead to unwanted variability or bias. A number of automated systems have been proposed, which measure movement either using video algorithms or other forms of movement detection, and then threshold a particularly low movement value as freezing.
Despite the apparent simplicity of this task, a reliable and wellvalidated automated scoring system has been difficult to develop. In a prior report, Anagnostaras et al. (2000) detailed an accurate automated system and argued that very high correlation and excellent linear fit (intercept of 0 and slope of 1) between human and automated freezing scores were essential, as was the ability to score very low freezing (i.e., detect small movements), or very high freezing (detect no movement). Likewise, a system meeting these criteria would produce mean computer and human values (for group data) that are nearly identical (e.g., Anagnostaras et al., 2000, their Figure 3, and our Figure 4 below). Correlation alone is insufficient, because high correlation can be achieved with scores that are on a totally different scale, with a non-linear shape, and only across a small range of freezing values (Anscombe, 1973;Bland and Altman, 1986;Anagnostaras et al., 2000;Marchand et al., 2003). Although the system of Anagnostaras et al. (2000) scored freezing well it had several limitations, most prominently that it was not commercially available, and thus not feasible to update, distribute or support. Unfortunately, even 10 years later, few systems are well validated. Table 1 is an overview of the systems in mice for which we could find some validation published in scientific journals. A few have good validation and seem to score freezing well (Anagnostaras et al., 2000;Kopec et al., 2007;Pham et al., 2009). For others, no linear fit between human and automated scores is given, or it is unexceptional (Milanovic et al., 1998;Valentinuzzi et al., 1998;Stiedl et al., 1999;Fitch et al., 2002;Nielsen and Crnic, 2002;Misane et al., 2005). For example, Valentinuzzi et al. (1998) and Misane et al. (2005) use photobeam-based systems that produce fairly high intercepts (∼20%). In the case of Valentinuzzi et al. (1998) the system effectively doubles human scores as well (see their Figure 2). Although both authors suggest you could transform the data by subtracting the intercept, this would yield negative freezing scores for some animals, and instead Misane et al. (2005) rightfully treat it as a separate, but related measure, immobility. Photobeam-based systems, that have detectors placed 13 or more mm apart, may have difficulty achieving the spatial resolution needed to detect the very small movements (such as minor grooming or head sway) that are still not considered freezing (Marchand et al., 2003). Other automated systems did not report any human scores, and just demonstrated that their system could produce freezing scores of some sort (Richmond et al., 1998). Still others have not undertaken the step to fully publish a proper validation of their system as is undertaken here. Finally, others systems have been validated to some extent for rats, and may or may not be effective for mice (Maren, 1998(Maren, , 2001Takahashi, 2004).
Here we report on the performance of the VideoFreeze system, developed in collaboration with Med-Associates Inc. This system includes significant advances, including (1) a turn-key behavioral system, with dedicated fear conditioning software, and everything needed to run and score fear conditioning experiments including chambers, software, lighting, ventilation, and environmental modifications, (2) progressive-scan digital video which eliminates problems associated with analog video and video storage, and reduces lighting sensitivity, and (3) LED-based white and near-infrared lighting, within an enclosure, along with a visible light filter for the camera, which ensures that the camera treats all lighting conditions as equal. This latter modification allows visible lighting conditions inside the chamber to be altered dramatically without affecting the camera's image or how the computer scores movement or freezing. In this report, we validate the ability of this novel system to accurately score freezing. And, as in Anagnostaras et al. (2000) we document the use of this automated system to accurately assess locomotor activity, activity suppression, and shock reactivity, as auxiliary measures of fear conditioning.

MaterIals and Methods subjects
Twenty Hybrid C57BL/6Jx129T2SvEms/J (129B6, stock from the Jackson Laboratory, West Sacramento, CA, USA) male and female (approximately equal numbers) agouti mice were used for this deep; Med Associates Part Number VFC-008) was encased in a white sound-attenuated box (63.5 cm wide, 35.5 cm high, 76 cm deep; NIR-022MD) and was equipped with a speaker in the side wall and a stainless steel grid floor (36 rods, each rod 2-mm diameter, 8-mm center to center) and drop-pan. A proprietary overhead LED-based light source (Med Associates NIR-100) provided visible broad spectrum White Light (450-650 nm) and near-infrared light (NIR; 940 nm) (Zurn et al., 2007). Lab mice have color vision with cones of maximal sensitivity to ultraviolet (370 nm, short) and bluish-green (510 nm, middle) light, as well as a small number of melanopsin-expressing photoreceptors regulating circadian rhythm (480 nm), with little or no vision in the NIR range (rats are similar) (Jacobs et al., 1991;Hattar et al., 2003;Gouras and Ekesten, 2004). Mouse rods also have experiment. Mice were weaned 3 weeks after birth and were at least 10 weeks old at the time of testing. Mice were group housed (two to five mice per cage) with unrestricted access to food and water under a 14:10-h light-dark cycle. All animal care and experimental procedures were approved by the University of California, San Diego Institutional Animal Care and Use Committee and were in accordance with the National Research Council Guide for the Care and Use of Laboratory Animals.

Fear condItIonIng
Four mice were tested concurrently in individual conditioning chambers within a single room. Each clear polycarbonate (top, front), white acrylic (back), and stainless steel (sides, shock grids, drop pan) conditioning chamber (32 cm wide, 25 cm high, 25 cm

Vargas-Irwin and
Robles (2009) Maren, 1998Maren, , 2001Marchand et al., 2003;Takahashi, 2004 , 1972;Bolles and Riley, 1973;Bolles and Collier, 1976;Fanselow and Bolles, 1979). This definition was developed in rats but is identically applied to mice (DeLorey et al., 1998;Anagnostaras et al., 2003). Every 2-s, a lap-interval timer signaled the experiment-blind observer to score freezing for a given animal at that moment. The observer rotated scoring among the four chambers being viewed, resulting in a single freezing score per mouse every 8 s. Percent freezing for a given period was then calculated by dividing the total number of freezing bouts by the total number of scores for each mouse Bolles, 1979, 1980;Fanselow and Bolles, 1979;Sigmundi et al., 1980). Two observers with inter-observer reliability of 0.94 scored behavior and their scores were averaged to generate a single human score. This sampling procedure is an efficient way of estimating continuous observation with a stopwatch (i.e., turning a count-up timer on whenever an animal froze and turning it off whenever the animal started moving; then taking the total time freezing over the total observation time (Bolles and Riley, 1973;Bolles and Collier, 1976;Phillips and LeDoux, 1992). Our test periods were divided into four blocks: (1) Training Baseline, the first 2 min of training prior to the first tone-shock pairing; (2) Post-Shock/Context, the last 5 min of training, after the shock; (3) Tone-Baseline, the first 2 min of the tone test, before the tone was turned on; and (4) Tone, the 3-min when the tone was on during the tone test). Each block was treated as a separate observation for correlational and linear fit analysis (i.e., there were 80 total human-computer pairs of observations, four pairs for each mouse). This approach allowed us to have observations from both the computer and human scorers at a variety of low and high levels of freezing.

Computer scoring
A proprietary motion analysis algorithm was used to generate a Motion Index from the digital video stream in order to estimate the amount of mouse movement. This algorithm analyzed the video stream in real time, as it was being saved to disk, and it was capable of analyzing up to four video cameras simultaneously recording at 30 frames per second, 320 × 240 pixels, 8-bit grayscale. Briefly, a reference video sample is taken prior to placing the mouse into the chamber ("calibration"). This reference sample establishes the amount of baseline noise inherent in the video signal on a per pixel basis, across multiple successive frames. Once the mouse is placed in the chamber, successive video frames are continuously compared to each other and to the reference sample on a pixel by pixel basis. Any differences between pixels in the current video signal larger than those in the reference sample are interpreted as animal movement. These differences (in pixels) are summed for each image frame, and this summation is counted as the Motion Index. The Motion Index is the number of pixels that have changed within a designated time period more than they would change if the mouse was not present (i.e., video noise). As detailed below, the Motion Index is subjected to a Motion Index Threshold to generate freezing scores.
Computer-derived freezing scores were systematically compared to hand scored freezing (below) across a range of user-entered parameters in order to generate the best linear fit and correlation between the computer-derived scores and human scores. For video storage, the four streams from the four chambers are saved into one maximum absorption around 505 nm with no sensitivity beyond 700 nm (Lem et al., 1999). LED based lighting was chosen for its reliability, longevity, low power consumption, and low heat generation. Background noise (65 dBA) was provided by internal ventilation fans as well as an iPod/speaker combination playing white noise placed centrally in the testing room. Video images of the behavioral sessions were recorded at a frame rate of 30 frames per second (640 × 480, downsampled within the driver to 320 × 240 pixels; about 1 pixel per visible mm 2 ) via an IEEE 1394a (Firewire 400) progressive scan CCD video camera (VID-CAM-MONO-2A) with a visible light filter (VID-LENS-NIR-1) contained within each chamber and connected to a computer in an adjacent room. Downsampling of pixels was necessary for real-time analysis of four chambers but did not adversely affect scoring. In contrast, a reduction in time sampling (e.g., from 30 to 15 Hz) reduced the quality of scores (not shown; see Vargas-Irwin and Robles, 2009). The interior of the training context, and a sample video frame can be seen in Figure 1A. A general activity index (Motion Index; see below) was derived in real time from the video stream by computer software (Video Freeze; SOF-843) running on a Windows computer. After a 2-min baseline period, 3 tone-shock pairings were administered, consisting of a 30-s pure tone (2.8-kHz, 85 dBA) coterminating with a 2-s scrambled footshock (0.75 mA, RMS, AC constant current) delivered through the floor of the cages. The mice remained in the context in order to score post-shock freezing behavior (which served as a way to measure Context freezing), resulting in a total of 10 min exposure to the conditioning context. Each chamber was cleaned and scented with 7% isopropyl alcohol between trials.

Cued fear testing
Forty-eight hours after training, mice were placed in the original conditioning chambers, modified along a number of dimensions. The modified context contained a smooth white acrylic insert (ENV-005-GFCW) instead of the grid floor, and had a black plastic triangular tent (ENV-008-IRT translucent to only NIR light, placed inside the chamber. Mice, therefore, perceived their experience in the new context as being in near total darkness, while the camera, which has a visible light filter, saw little difference in lighting from one context to another. The chamber, from the camera's view, can be seen in Figure 1B. Please note, however, that to the unassisted human or mouse eye, the tent appears black, and the chamber is in near total darkness. A 7% white vinegar solution replaced the alcohol solution for cleaning and scenting to provide a novel odor. The mice remained in this new context for a total of 5 min, consisting of a 2-min baseline period, followed by a 3-min presentation of the tone.

Hand scoring of freezing behavior
Freezing behavior of each mouse was scored post hoc by an experimenter using instantaneous time sampling while viewing the digital video playback of the experiment. Four chambers were viewed concurrently during scoring, in order to more closely replicate previous scoring protocols (Fanselow, 1980;Anagnostaras et al., 1999bAnagnostaras et al., , 2000. Freezing was defined in the tradition of the R.C. Bolles lab, as the absence of all movement, aside from that required for respiration (without regard to posture) (Grossen Anagnostaras et al. Video Freeze y = 1x + 0). We feel that the y-intercept is of primary importance because any nonzero y-intercept would inflate or deflate freezing scores, including baseline freezing, by that amount (Anagnostaras et al., 2000). Graphpad Prism (La Jolla, CA, USA) was used for all analyses. Figure 2A depicts the correlation compared to the number of frames for various motion thresholds. Although the correlations between computer and human scores were always high (0.960 -0.972), 30 frames (at 30 Hz = 1 full sec) of Minimum Freeze Duration generated the best correlations. We focused on 30 frames for the Minimum Freeze Duration and added additional Motion Index Threshold values, as shown in Figure 2. Figure 2B depicts the y-intercept (b) of the linear fit between computer and human scores for various motion thresholds. Again, 30 frames produced the smallest y-intercepts (-0.55 to 1.13%, compared to up to 8.74% for five frames), with a Motion Index Threshold of 18 producing the intercept closest to 0 (-0.007). Finally, Figure 2C depicts the slope (m) of the linear fit between computer and human scored freezing for various motion thresholds. 30 Frames easily generated the best slope (0.964 -0.989, compared to as low as 0.950 for five frames). A Motion Index Threshold of 18 was chosen because of high correlation (0.971) and slope (0.974), with the y-intercept closest to 0 (-0.007 %). Again, the near-zero y-intercept is considered of primary importance because an automated scorer should not systematically generate inflated or deflated baseline freezing scores. Finally, a Minimum Freeze Duration of 30 frames (1 s) was chosen because this always produced the best correlation and linear fit across all conditions. Even when 30 frames are used as the Minimum Freeze Duration, the system still has a one second resolution for scoring freezing, which is far better than investigators need. Together, the chosen parameters are referred to as (18,30) and are explored further below. The final overall linear model comparing Video Freeze scores to human scores was VideoFreeze (18,30) = 0.974 (Human) -0.007, r = 0.971, p < 0.0001 (Fisher's r-to-z).
Windows Media Video 9 file (WMV3 codec), 320 × 240 pixels (32 bits) per stream, 30 frames/s, with a variable total bitrate averaging about 1200 kb/s. These videos (Figure 1) require only 2.3 MB/min, per chamber, and therefore thousands of videos can be cheaply stored for many years on digital media.

results tItratIng paraMeters
Two algorithm parameters can be adjusted to generate freezing scores. VideoFreeze software now defaults these parameters to the values derived in this study for mice. The two parameters that can be adjusted are the Motion Index Threshold (below which freezing is scored), and the number of frames that the Motion Index must remain below the Motion Index Threshold to be considered freezing (Minimum Freeze Duration). In order for the animal to be considered freezing, the Motion Index must remain below the Motion Index Threshold for the Minimum Freeze Duration. In prior work, we found that a Minimum Freeze Duration of one full second (30 frames) produced the scores closest to human observers (who are instructed to make an instantaneous or momentary judgment) (Anagnostaras et al., 2000). For this reason we examined correlation (Figure 2A) and the linear fit equation 1 [y-intercept (b; Figure 2B) and slope (m; Figure 2C)] between human (x) and computer (y) scored freezing under varying conditions of Minimum Freeze Duration at 30 Hz (5, 10, 15, or 30 frames, i.e., 0.17, 0.33, 0.5 or 1 s) and varying Motion Index Thresholds (19-21, from pilot work, which was further refined from 15 to 25 once we settled on 30 frames for the Minimum Freeze Duration). A perfect scoring system would generate a correlation of 1.0, slope of 1.0, and y-intercept of 0 (i.e., 1986). Although we think a combination of linear fit and correlation are also critical, the mean bias (VideoFreeze-Handscore) computed using the Bland-Altman analysis was 0.90% ± .7.22 (μ ± 1 SD), and the Bland-Altman plot for all freezing data is shown (Figure 3C). Bias remained low within the White (1.49 ± 7.23) or NIR-only (0.30 ± 7.25) conditions. These compare favorably with the plotted value of about -5% in the Pham et al. study.

abIlIty to estIMate group Means
Computer and human-scored group means were compared for the test components and are depicted in Figure 4. The training day ( Figure 4A) consists of Baseline freezing (the first 2 min of the training day), and Context freezing (the last 5 min of the training day). The Tone Test (Figure 4B) consists of Tone Baseline (the first two minutes of the tone test), and Tone (the period when the tone was on during the tone test). In all cases, the (18,30) algorithm generated means with <2% difference from hand-scored means, and standard errors with <0.5% difference from hand-scored data. Univariate ANOVAs for each measure (comparing human and computer scores) yielded F values <0.2, with p values >0.9. Therefore, the computer and hand scores were virtually identical.

MeasureMent oF locoMotor actIvIty
An additional benefit of automated systems is the ability to measure locomotor activity, which can be useful in at least three different ways: First, baseline activity, prior to any tone or shock on the training day, can be a useful measure of general activity effects (such as in the open field). Activity during this period, for example, can readily detect the effects of hippocampal lesion or psychostimulant drugs (Anagnostaras et al., 1999b;Wood et al., 2007;Shuman et al., 2009;Wood and Anagnostaras, 2009). Raw mouse movement, as Motion Index, is shown in Figure 5A, for the initial training baseline, postshock (context) period, tone test baseline, and tone test period.

lInear FIt
Training/contextual testing and tone (cued) testing are usually conducted in two different environments, on separate days (e.g., Anagnostaras et al., 1999b). In order to enhance efficiency and reduce cost and space requirements, our setup used the same physical chambers, but these were varied along several dimensions. The training chamber (Figure 1A) had shock grid floors, was lit with near infrared (NIR) and white light, and was scented with a 7% ethanol solution. On a separate day, tone testing was completed in the same chambers, but with a flat white acrylic sheet floor, a triangular teepee-like insert that was translucent to NIR but not visible light (Figure 1B), and a 7% white vinegar (10% acetic acid) solution for scenting. The use of weakly concentrated volatile nonoily odorants with low boiling points and high water solubility (e.g., 5-10% of ethanol, isopropanol, vinegar, ammonium hydroxide), which dissipate quickly and can be easily cleaned, is strongly preferred over the plethora of oily immiscible odorants which do not clear readily. The camera always had a visible light filter, so from the camera's perspective, the lighting conditions remained the same (only NIR light was visible to the camera) during both testing days. From the eye's perspective, however, lighting conditions appear to change dramatically between the contexts, since the mouse's eye has no sensitivity in the NIR range. The tone testing context appears very dark when the visible light is turned off, whereas the other is quite bright. In order to ensure good scoring for the (18,30) parameters in each context, individual linear fit is shown for White Light (Figure 3A), or NIR light ( Figure 3B). Linear fit and correlation remained exceptional across lighting conditions. Finally, Pham et al. (2009) argue that the Bland-Altman plot (widely used to estimate agreement in the medical sciences) may be more appropriate. This analysis, also known as the Tukey mean-difference plot, compares the deviation between computer and human scores, which is then called "bias" (Bland and Altman, Intercept. The linear fit between VideoFreeze-scored and human-scored freezing is compared for the y-intercept. The y-intercept is important because it reflects how much the system overestimates or underestimates freezing. Larger number of video frames and lower motion thresholds yielded lower y-intercepts. A threshold of 18 yielded the lowest nonnegative intercept. (C) Slope. The slope term from the linear fit is depicted compared with frames and motion threshold. Larger frame numbers yielded a slope closer to 1. A motion threshold of 18 and number of frames of 30 was chosen for having the best combination of high correlation, intercept close to 0, and slope close to 1. Au, arbitrary units.
use of conditioned suppression as an index of fear, including suppression of spontaneous activity, is well-validated and has some emerging popularity Maren et al., 1998;Anagnostaras et al., 2000Anagnostaras et al., , 2003Frankland et al., 2001;Matynia et al., 2008). When used with appropriate interpretive cautions, suppression scores can be particularly useful for handling situations where baseline differences in activity or freezing exist Anagnostaras et al., 2000;Frankland et al., 2001;Restivo et al., 2009). Figure 5B shows activity suppression for the context and tone test. Finally, the gross motor reactivity to shock, known as the activity burst, or unconditioned response (UR), during the actual 2-s shock, can be used a measure of shock reactivity or pain (DeLorey et al., 1998;Anagnostaras et al., 1999cAnagnostaras et al., , 2000Maren, 1999;Wood et al., 2007;Shuman et al., 2009;Wood and Anagnostaras, 2009). This can be important in situations where some concern exists that the animals might not feel the shock. In previous studies One can see that activity is high during initial placement into the chamber, and then drops as the animal sets into freezing [paired two-tailed t-test, t(19) = 8.64, p < 0.0001]. Moreover, activity is somewhat higher in the dark (NIR) environment, when the animal is placed in a novel context for the baseline of the tone test [versus training baseline, t(19) = 3.28, p < 0.01], and activity drops again when the tone is turned on [t(19) = 11.8, p < 0.0001]. We do not advocate the use of raw activity as a measure of fear, due to substantial variability in the individual animal's baseline activity (Maren, 1998;Anagnostaras et al., 1999b). Rather, an activity suppression ratio (SR) can be computed, using the equation SR = (Activity on test)/(Activity on Test + Activity on Baseline). Suppression ratios of 0.5 indicate no fear (i.e., the same level of activity on test as on baseline), whereas those less than 0.5 indicate fear, and those more than 0.5 can indicate safety (Annau and Kamin, 1961;Bouton and Bolles, 1980;Maren et al., 1998;Anagnostaras et al., 2000). The Test. Freezing during the tone baseline (Tone BL, first 2 min) and during the tone (last 3 min) is depicted. In all cases, VideoFreeze estimated the means error nearly perfectly. this paper (which is even quoted in William James' 1890 Principles of Psychology). Freezing in rats was thought of as an instinctive fear response to cats as early as 1919 (Griffith, 1919;Curti, 1935Curti, , 1942. Within experimental psychology, a few references to freezing (and crouching) as a classically conditioned fear response exist at least since the 1950s (Hunt and Otis, 1953;Seward and Raskin, 1960). The modern popularity, however, is attributable to the R.C. Bolles laboratory (where Fanselow, Bouton, and Grossen were students), and the definition of freezing in rats (and mice) today mirrors that of Grossen and Kelley (1972). They closely followed the position of Bolles' famous 1970 theoretical paper that championed the use of natural species-specific defense reactions (including freezing) as conditioned fear responses (Grossen and Kelley, 1972;Bolles and Riley, 1973;Bolles and Collier, 1976;Fanselow and Bolles, 1979). Grossen and Kelley stated that "Freezing behavior was defined as the rat being immobile without movement of the vibrissa." This definition is very similar to that used by Bolles and colleagues (which allowed movements required for respiration during freezing), and is in sharp contrast to that used by R.J. and D.C. Blanchard for another defense behavior, crouching, which was time-sampled and defined as the "[rat's] weight supported by its hindlimbs, which were contracted, with forelimbs extended" (Blanchard and Fial, 1968;Blanchard and Blanchard, 1969). Crouching is no doubt somewhat related to freezing in rats, which often exhibit this posture when scared, but it is not very apparent in mice, hence no longer preferred, and often cited incorrectly as the definition for freezing. Moreover, freezing was initially scored continuously with a stopwatch (i.e., switched "on" whenever an animal froze, and switched "off " whenever the animal started moving) thus generating a total freezing time that could be divided by the total observation time (Bolles and Riley, 1973;Bolles and Collier, 1976;Phillips and LeDoux, 1992). This technique was replaced by time-sampling in the Bolles laboratory around the time of J. Altmann's (1974) paper on the validity of time-sampling of behavior, in general we have emphasized the use of a semi-automated procedure that required digitizing 10 Hz video and clicking on individual frames to identify the mouse and compute true speed (for full details see Maren, 1999;Anagnostaras et al., 2000). This is compared to the fully automated Motion Index from the Video Freeze software, captured at 30 Hz for the first 2 s shock, as shown in Figure 5C. The average Motion Index captured most of the variability of true speed (r = 0.841, p < 0.0001). Moreover, there was a good linear fit between Motion Index and true speed; although we prefer to report the Motion Index as arbitrary units (au), one can grossly compute activity burst speed (in cm/s) using this equation in mice: Speed = (Motion Index -97.3)/36.8, with the appreciation that this estimate has some error. Overall, Motion Index is a good replacement for semi-automated methods of scoring the activity burst UR.

dIscussIon
Pavlovian fear conditioning of freezing behavior is of growing importance to a number of fields related to neuroscience and psychology, as a highly efficient way of studying learning and memory (Anagnostaras et al., 2000(Anagnostaras et al., , 2001Maren, 2008). In particular, contextual conditioning is a popular model of declarative, hippocampus-dependent memory (Matynia et al., 2008). In both rats and mice, freezing is a robust conditioned fear response with very low baseline behavior -that is, freezing is not a behavior rats or mice typically perform in response to ordinary novel stimuli, giving it a large advantage over many other measures of fear. A primary drawback of freezing behavior has been that it required hand scoring by continuous or time-sampled direct visual observation.
The history of the visual method of scoring freezing is not wellknown, and is worthy of review here as more researchers transition to automated scoring. Several references to this behavior as an expression of fear can be found in Darwin's seminal The Expression of the Emotions (1872), such as the one in the opening paragraph of The Motion Index can be used to estimate locomotor activity. Activity during the baseline from the training day (Context) and tone test (Tone) is depicted. Activity starts high (baseline, first 2 min) dramatically drops after conditioning (Test, last 3 min). Activity is higher in the NIR-only light during the tone test (Tone Baseline, first two min), and drops when the tone is played (Tone Test, last 3 min). (B) Activity suppression. Activity suppression scores can be used to correct for differences in baseline activity and can be used as an alternative measure of fear. (C) Shock Reactivity. The motion index during the 2-s shock is compared to true mouse speed. Shock reactivity could reliably be measured using the motion index and showed a good linear fit with true speed.

Context
Tone 0  chambers to be used for context and tone testing, saving money and space, and (3) a sophisticated movement detection algorithm allows us to score very small movements needed to make a freezing determination, as well as large movements such as during the activity burst UR. Finally, we tested the ability of the system to score freezing and demonstrated that it produces freezing scores nearly identical to human observers, and can further score the high-speed activity burst UR.
As a final note, we strongly encourage experimenters to stay connected to the measured behavior by viewing at least some of the videos of animals' during fear conditioning. We also encourage investigators to repeat for themselves the validation described here. Automated systems can detach the experimenter from the behavior, and with the increasing popularity of the conditioned freezing task, more and more researchers are using it as an assay of memory without having knowledge of the rich and long history of this task. However, it is only with deep knowledge and close observation and measurement of the behavior that the paradigm can ultimately be improved and refined.

acknowledgMents
This research was supported by a Hellman Fellowship (Stephan G. Anagnostaras). Suzanne C. Wood was supported by NSF Fellowship and NIH NRSA (DA026259). Tristan Shuman and Denise J. Cai were supported by Chancellor's Interdisciplinary Collaboratories Grant. We thank Michael Fanselow, Mark Bouton, Ronald Sigmundi, Stephen Maren, and Neal Grossen for thoughtful historical information. (Altmann, 1974). Instantaneous or momentary time-sampling of behavior is a highly validated method that is superior in terms of accuracy to other forms of time-sampling, and has been studied extensively (Altmann, 1974;Meyers and Grossen, 1974;Powell et al., 1975Powell et al., , 1977. Instantaneous time-sampling of freezing was used in the work of Curti (1935Curti ( , 1942 and emerged in the published works of the R.C. Bolles laboratory around 1979-1980(Fanselow and Bolles, 1979Bouton and Bolles, 1980;Sigmundi et al., 1980) as an efficient way of estimating continuous observation with a stopwatch. Of course, the task's popularity today is owed in great extent to the celebrated 1992 findings by J. J. Kim, M. S. Fanselow, R. G. Phillips, and J. E. LeDoux which clearly outlined the respective roles of the amygdala and hippocampus in conditioned freezing (Kim and Fanselow, 1992;Phillips and LeDoux, 1992).
Previously, Anagnostaras et al. (2000) reported a proprietary automated system of scoring that alleviated the burden of hand scoring, but which also had a number of shortcomings, most significantly, the lack of commercial availability. Here, we validate a new fear conditioning system that resolves those problems and offers a number of further refinements. Aside from commercial "turn-key" availability from a well-established and stable supplier, the technical refinements in this system make it truly state of the art: (1) digital video confers high resistance to electrical noise, and ensures that quality compressed videos can be stored indefinitely for later use, (2) the use of LED near-infrared lighting within a sound and light attenuating enclosure ensures lighting conditions viewed by the camera remain similar, even when drastic environmental modifications are made (Figure 1), and allows the same