ORIGINAL RESEARCH article
Front. Hum. Neurosci., 12 November 2009 | https://doi.org/10.3389/neuro.09.044.2009
VA Boston Healthcare System, Boston, MA, USA
Department of Psychology, Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
Visual categorization is a remarkable ability that allows us to effortlessly identify objects and efficiently respond to our environment. The neural mechanisms of how visual categories become well-established are largely unknown. Studies of initial category learning implicate a network of regions that include inferior temporal cortex (ITC), medial temporal lobe (MTL), basal ganglia (BG), premotor cortex (PMC) and prefrontal cortex (PFC). However, how these regions change with extended learning is poorly characterized. To understand the neural changes in the transition from initially learned to well-practiced categorization, we used functional MRI and compared brain activity and functional connectivity when subjects performed an initially learned categorization task (100 trials of training) and a well-practiced task (4250 trials of training). We demonstrate that a similar network is implicated for initially learned and well-practiced categorization. Additionally, connectivity analyses reveal an increased coordination between ITC, MTL, and PMC when making category judgments during the well-practiced task. These results suggest that category learning involves an increased coordination between a distributed network of regions supporting retrieval and representation of categories.
Visual categorization allows us to effortlessly interpret a wide range of sensory information into a limited number of meaningful categories. This process enables the efficient response to novel stimuli and is the foundation for visual perception and memory. Though initial category learning and well-practiced categorization have been well-studied at both the behavioral (Fabre-Thorpe, 2003 ; Ashby and Maddox, 2005 ) and neural levels (Reber et al., 1998 ; Kanwisher, 2000 ; Haxby et al., 2001 ; Seger, 2008 ), few studies have directly compared these time points in learning nor the large scale network changes in the transition from initially learned to well-practiced categorization.
Initial visual category learning relies on representations in early visual cortex and inferior temporal cortex (ITC), though other mechanisms employed and brain regions involved depend on the strategy that subjects use. According to COVIS (COmpetition between Verbal and Implicit Systems), a prominent theory of the neural basis of category learning, when the rule that separates categories is verbalizable, brain regions supporting working memory such as prefrontal cortex (PFC) and parts of the basal ganglia (BG) that include the head of the caudate nucleus are implicated (Ashby and Maddox, 2005 ). In contrast, when the category rule is not verbalizable, a procedural system that depends on the tail of the caudate nucleus is involved. Studies demonstrating initial category learning deficits in patients with damage to the PFC (Barcelo and Knight, 2002 ) and BG (Maddox et al., 2005 ) support this theory. The involvement of the medial temporal lobe (MTL) in initial category learning is unclear: some studies have demonstrated that MTL damage significantly impairs visual category learning (Hopkins et al., 2004 ) while others have shown that forms of category learning can occur without an intact MTL (Knowlton and Squire, 1993 ; Knowlton et al., 1994 ). Studies have also implicated the premotor cortex (PMC) in learning new visual categories, especially for categories that require a stereotyped motor response (Halsband and Freund, 1990 ; Boettiger and D’Esposito, 2005 ).
Though PFC, PMC, BG, and MTL have shown to be important to initially learning visual categories, damage to these regions does not consistently produce deficits in the automatic recognition of well-established categories (Freedman et al., 2001 ; Zgaljardic et al., 2003 ; Squire et al., 2004 ). There are some reports, however, of patients with MTL damage that present with object discrimination difficulties, suggesting that the MTL may facilitate higher-level object processing (Lee et al., 2005 ). Additionally, lateral frontal regions have feedback connections to object recognition regions in ITC and studies have shown that lateral frontal regions modulate ITC to help facilitate successful recognition (Fuster et al., 1985 ; Barcelo et al., 2000 ; Gazzaley et al., 2007 ). A goal of the current study is to further characterize the supporting roles that the PFC, PMC, BG, and MTL play in well-established categorization and how the interactions between these regions and ITC change with category learning.
In contrast to patients with PFC, PMC, BG, and MTL lesions, patients with ITC lesions present with severe deficits in recognizing well-established categories (Warrington, 1982 ). These patients can often describe an object in their visual field in great detail, including color, texture, and shape but are unable to integrate this information to identify the object. Thus, ITC has a crucial role in organizing perceptual features into an integrated percept, essentially linking perception with recognition. In addition to this role in representing and recognizing well-established categories, ITC has also been shown to be modified with learning. Studies have generally demonstrated increases in ITC activity and enhanced neural tuning after extensive training with novel categories (Gauthier et al., 1999 ; Op de Beeck et al., 2006 ; Jiang et al., 2007 ) and when comparing category experts with category novices (Gauthier et al., 2000 ). These training-related changes have been demonstrated in the regions that preferentially respond to stimuli before training, such as the lateral occipital cortex for shapes, as well as in changes in the overall pattern of activity across ITC (Op de Beeck et al., 2006 ). Though these studies clearly demonstrate ITC changes with learning, the interpretation of these changes and overall organization of ITC are intensely debated (Tarr and Gauthier, 2000 ; Haxby et al., 2001 ; Kanwisher and Yovel, 2006 ). Current models suggest that extensive category learning is accompanied by changes in activity in regions representing that object or process, such as in the fusiform face area for faces, which may reflect the recruitment of new processes or modifications in object representations (Tarr and Gauthier, 2000 ; Kanwisher and Yovel, 2006 ). An alternative model suggests that changes distributed across all of ITC may be more important to category learning and representation than any specific region(s) (Haxby et al., 2001 ).
A goal of the current study is to compare the regional and network changes that occur during extensive category learning to better characterize the mechanisms of category learning in ITC.
In addition to characterizing changes within PFC, PMC, BG, MTL, and ITC we also seek to explore the overall changes in involvement of these regions in initial categorization as compared to later in learning. Though there are many variations of how this could occur, we propose two general possibilities: (1) there could be a shift from the network including PFC, PMC, BG, MTL, and ITC regions to a new network more focused on visual regions or (2) the same network could be utilized for visual categorization both early and late in learning and with practice there could be a redistribution of activity within the same network (for a review of these mechanisms see Kelly and Garavan, 2005 ). Several studies demonstrate a shift from the initial network of regions engaged to a new network when training involves recruiting different strategies at the beginning and end of practice. For example, Poldrack et al. (2001) showed that during classification learning, initially MTL structures were recruited and later, when learning was more associative, activation shifted to more basal ganglia involvement. Additionally, Fletcher et al. (1999) demonstrated a decrease in right fronto-parietal activity and increase in left fronto-parietal connectivity during the acquisition of artificial grammar rules. As far as category learning is accompanied by salient shifts in cognitive and neural strategies, we might expect to see a shift in the network from PFC, PMC, BG, MTL, and ITC early in learning to a network more focused around ITC regions since damage to these regions are the only lesions that consistently impair the retrieval of well-established categories.
An alternative to shifting to a new network is that the same network of PFC, PMC, BG, and MTL may serve as a permanent scaffold throughout all stages of learning: these regions initially facilitate category learning, retrieval, and decision-making and could continue to be involved in further updating, retrieving, and making decisions as categories become more well-established. This model suggests that a similar network including PFC, PMC, BG, MTL, and ITC is recruited throughout category learning, though parts of this network such as the PFC and BG may be less active late in learning when updating visual representations and retrieval demands are less pronounced (Wagner et al., 2001 ). Thus, another goal of the current study is evaluate the overall network changes that accompany visual category learning and decide between these two models.
Standard univariate analyses of regional activation changes could be useful to determine whether the same network or a different network is recruited with category learning. However, univariate analyses cannot assess if regions that change with learning are functionally connected. One approach to this issue is to measure the activity covariance between a region known to be involved in the task and the rest of the brain. By comparing this covariance map before and after training, it could help characterize the changes in task-related functional networks that accompany categorization training. A functional MRI method that is well-suited for this approach is coherence analysis. In coherence analysis, a reference or seed region is identified and the time series in this region is correlated, in the frequency domain, with the time series of every other voxel in the brain (Sun et al., 2004 , 2007 ). In this way, coherence analysis provides a task-related network with the seed region and we can measure how this network differs between initially learned and well-practiced categorization. The main advantage of coherence over simply correlating activity with a seed region is that coherence does not depend on the estimate of the hemodynamic response function or a model of neural activity. Thus, coherence is not affected by regional differences in hemodynamic responses whereas correlating activity is biased to produce high correlations between regions with similar hemodynamic responses (Muller et al., 2003 ). Also, by using partial coherence, we can measure the task-induced relationship between two regions while factoring out the stimulus-locked response (see Sun et al., 2004 for further details).
In the current study, by using two categorization tasks, a novel task (100 training trials) and a well-practiced task (4250 training trials) with similar stimuli, we assess initially learned and well-practiced categorization in a single fMRI session. We chose to use faces as stimuli because faces have been shown to be obligatorily processed in a focal ITC region (right fusiform face area-FFA, Kanwisher et al., 1997 ; McKone et al., 2007 ), which can be functionally localized and assessed for learning-related changes. Additionally, the right FFA can be used as a seed region in the coherence analysis to identify categorization-related networks in order to assess how networks change with visual expertise.
Ten right-handed subjects ranging in age from 20–27 (M = 22.4) were recruited from the University of California, Berkeley. All participants were screened against medical, neurological, and psychiatric illnesses, and for use of prescription medications. All subjects gave written informed consent prior to participation in the study according to the procedures approved by the University of California, Berkeley Committee for Protection of Human Subjects.
Lifelike faces were created from the Faces composite face making software (Faces version 3.0). Using a template face, two categorization tasks were designed: the eyebrow–mouth task and the forehead–nose task (Figure 1 ). In the eyebrow–mouth task, the eyebrow height and mouth height varied in 2 mm increments to make 10 faces, while the other features remained constant (Figure 1 A). In the forehead–nose task, the forehead height and nose length varied in 2 mm increments to make 10 faces while the other features remained constant (Figure 1 B). In each task, subjects had to integrate information from both facial features to achieve optimal accuracy. To ensure that the results were not due to specific feature properties, half the subjects (5) trained with the eyebrow–mouth task and the other subjects trained with the forehead–nose task. At the beginning of training, subjects were shown a matrix of faces (for example, Figure 1 A) and told a verbal description of the categorization task. For example, “faces with higher eyebrows and lower mouths are generally in category 1 and faces with lower eyebrows and higher mouths are generally in category 2.” Next, subjects received 250 trials of self-paced computer training where they were presented with a face and had to respond by pressing one of two buttons with their right hand designated to each category. Feedback (blue “correct”/red “incorrect”) was provided immediately after each trial to further facilitate category learning. After 250 trials, subjects took a break and received a feedback matrix that showed their accuracy and reaction time for each face and their monetary bonuses (+0.02 per correct trial, −0.01 per incorrect trial). Subjects used this information to try to boost their performance. Two 250 trial sessions were performed on the first training day and three were performed on each of the five training days thereafter. To ensure subjects were learning a general strategy and not memorizing individual faces, parts of the template face changed each day (see Figure 2 ). However, parts of the template faces that could potentially affect performance (such as the hair for the forehead–nose rule) were not changed.
Figure 1. Example stimuli used in the face classification task. Eyebrow height and mouth height varied in 2 mm increments for faces in the eyebrow/mouth task (A) and forehead height and nose height varied in 2 mm increments for faces in the forehead/nose task (B). This produced 2 matrices of 12 faces: 6 faces were assigned to a left button press and 6 to a right button press. Only 10 faces were used during training (shown surrounded by thick borders). During scanning, two new faces were introduced for each task (shown surrounded by dashed borders).
Figure 2. Different template faces were used for each day of training. For the forehead/nose task, eyes, eyebrows, and mouth changed day to day whereas the hair, nose, ears, and jaw were constant. For the eyebrow/mouth task, hair and eyes changed day to day whereas the eyebrows, mouth, nose, ears and jaw were constant.
The fMRI scan was performed on the seventh day after training initiation. Before scanning, subjects received 100 trials of feedback training on the task they learned for the last 6 days. After this review of the well-practiced task, subjects received the new categorization task. If subjects received 6 days of training on the eyebrow–mouth task, they were given the forehead–nose task (and vice versa). Identical to the procedure on day 1 with the well-learned task, subjects received explicit instructions to categorize faces and performed 100 trials of feedback training in order to attain a steady level of performance. During the pre-fmri training and in the fMRI scanner, subjects performed the well-practiced and initially learned tasks with the same template face (see Figure 1 ). At the beginning of each scan, subjects were told which task they were performing. Additionally, initially learned and well-practiced tasks were blocked (three scans in a row) to eliminate confusion on which task they were performing.
While in the scanner, subjects performed the categorization tasks and a one-back task using blocks of face and scene stimuli (courtesy of Nancy Kanwisher, MIT). The categorization tasks in the scanner had several differences from training. First, two new face configurations for each task were introduced during the scanning session (see Figure 1 , dashed boxes). These configurations had extreme feature values and were easy to classify. They were introduced to assess if subjects were learning general strategies (subjects would effortlessly apply the strategy to the novel configurations) or were memorizing specific feature configurations (subjects would notice the new configurations). In contrast to the training in which stimulus presentation was self-paced, in the scanner faces were displayed for 2 s to control for exposure duration across trials and participants. Also, no feedback was provided during scanning to promote consistent strategy use throughout the scanning session. Lastly, instead of a 200-ms inter-trial interval (ITI), a 12-s ITI was used to allow hemodynamic responses to return to baseline. Subjects performed three runs in a row of categorization with the initially learned task and three runs in a row of categorization with the well-practiced task. The order of initially learned task runs and well-practiced task runs were counterbalanced across subjects. Each run was 5 min 36 s and contained 24 trials.
After the categorization tasks, subjects performed a one-back task with faces and scenes. In the one-back task, subjects were shown 16-s blocks of either novel faces, scenes, or fixation. During the face and scene blocks, 20 images were shown for 500 ms with a 300-ms fixation cross between each image. To keep the subject’s attention focused on the images throughout the task, subjects were instructed to press both thumbs on the response pad when the current image was the same as the image immediately preceding it (on average, one response was required for each block of images). There were seven blocks of each type and the scan lasted 5 min and 20 s.
Functional images were acquired using a gradient echoplanar sequence (TR = 2000 ms, TE = 28 ms, matrix size = 64 × 64, FOV = 22.4 cm) sensitive to BOLD contrast. Each functional volume consisted of 18 × 5 mm thick axial slices with 0.5 mm gap between each slice, providing whole brain coverage except for portions of the inferior cerebellum and the most superior extent of the parietal lobe. For each scan, 30 s of gradient and RF pulses preceded data acquisition to allow steady-state tissue magnetization and allow the subject to habituate to the scanner noise before performing the task. Stimuli were presented using Eprime software (Psychological Software Tools, Pittsburgh, PA) and all stimuli subtended a visual angle of approximately 5°. Participants viewed images in the scanner via back-projection onto a custom screen mounted at the participant’s chest level and viewed via an angled mirror mounted inside the head coil. Responses were made using a hand-held fiber optic button box.
fMRI Data Analysis – Univariate
Functional images acquired from the scanner were reconstructed from k-space using a linear time-interpolation algorithm to double the effective sampling rate. Image volumes were corrected for slice-timing skew using temporal sinc-interpolation. Data were preprocessed with SPM2 (Wellcome Department of Cognitive Neurology, London). Images were realigned using a six-parameter, rigid-body, least-squares alignment and spatially smoothed with an 8 mm FWHM Gaussian kernel. Univariate Statistical analyses were performed on individual subjects’ data with a modified general linear model (GLM) as implemented in SPM2. The fMRI time series data was modeled as a series of events with a 2-s duration convolved with a canonical hemodynamic response function (HRF). The resulting functions were used as covariates in a general linear model. For analysis of the categorization task, two covariates were used to model the fMRI data: one for the initially learned task and one for the learned task. For the analysis of the face-scene one-back, two covariates were used: one for faces and one for scenes. These covariates, along with a basis set of cosine functions that high-pass filtered the data, were included in a general linear model. The least squares parameter estimates of height and best fitting canonical HRF for each condition were used in pairwise contrasts. For each subject, images of parameter estimates for each contrast of interest were spatially normalized to an EPI template based on the MNI305 stereotactic space (Collins et al., 1998 ). This was accomplished using a 12-parameter affine transformation together with a non-linear transformation involving cosine basis functions. Volumes were then resampled to 2-mm cubic voxels. These normalized contrasts were submitted to a second-level one-sample t-test, in which the mean estimate across participants at each voxel was tested against zero. For exploratory analyses (see Table 1 ), regions of activation were identified using an uncorrected two-tailed threshold of p < 0.001 and a minimum cluster size of at least five contiguous voxels. For visualization purposes, unthresholded (Figure 4 ) and thresholded (Figure 5 ) statistical parametric maps were overlaid onto a normalized T1-weighted image using MRIcro software (www.mricro.com ). To determine the significance of regions in this exploratory analysis, we applied a whole-brain family-wise correction (p < 0.05). Additionally, to determine the significance a priori regions, we performed a small volume correction (using family-wise error rate of p < 0.05) for each region using MNI anatomical regions of interest. This included left and right hippocampi, parahippocampal gyri, middle frontal gyri, inferior frontal gyri, precentral gyri, fusiform gyri, inferior temporal gyri, caudate, and putamen.
Definition of Face Selective Regions
For each subject, the fusiform face area (FFA) was functionally defined using the contrast of novel faces minus scenes in the face-scene one-back task (Kanwisher et al., 1997 ). The FFA was defined by taking the peak voxel within the middle fusiform gyrus that responded more to the faces than to scenes and selecting the nine most significant contiguous voxels to the peak voxel. If the threshold had to be dropped below a t-value of 1.5 to find the peak voxel, the region was deemed unreliable and was excluded from further analyses. Activity within the FFA region of interest (ROI) was averaged across all voxels. This procedure was repeated for both hemispheres. Using this procedure, we successfully localized the right FFA in 10/10 subjects and left FFA 9/10 subjects. We performed ROI analyses on the right and left FFA for the univariate categorization model and for the right FFA for the coherence model (see below). For each subject, parameter estimates yielded by the GLM were extracted for each covariate and averaged within each ROI. These parameter estimates served as the dependent measures for across-subject “random-effects” analyses.
fMRI Data Analysis – Partial Coherence
To identify networks of functional connectivity for the right FFA, we generated coherence and partial coherence maps using the task-specific coherence between the right FFA seed and all other voxels. One potential limitation of coherence is that it can be driven by a stimulus-locked response. For example, two regions could have high coherence because they are independently responsive to a stimulus rather than because they are part of a common functional network. Partial coherence takes into account the stimulus-locked response and estimates any remaining coherence between two time series after (see Sun et al., 2004 ) for further discussion of partial coherence). Visual categorization is a very rapid process and simply using coherence analyses may only allow the examination of learning-related changes in stimulus- and response-locked regions such as early visual/inferior temporal and motor regions. Using partial coherence would allow the examination of other regions shown to be involved in categorization that are not stimulus- or response-locked such as hippocampus, basal ganglia, prefrontal cortex, and premotor cortex. This led us to perform our analyses on the partial coherence maps rather than coherence maps. To identify practice-related changes in functional interactions, we then contrasted these task-specific partial coherence maps. This procedure is described briefly below, see Sun et al. (2004) for further detail.
Selection of reference voxels
For each subject, we used the average of the voxels within the right fusiform face area (rFFA), defined in the univariate analysis as described above, as a seed for the coherence analyses.
Generation of condition-specific time-series
To generate condition-specific time-series, the data were separated into initially learned task and well-practiced task blocks. Each time-series each had a total of 1008 data points.
Estimation of condition-specific coherence maps
Coherence is the normalized cross-covariance of two time-series and is defined by the magnitude-squared of the cross-spectrum divided by the power spectra of both time-series. Here, we calculate the coherence by estimating the cross-spectrum and power spectra using Welch’s periodogram-averaging method.
Specifically, the power spectra were estimated by averaging the magnitude-squared discrete Fourier transform of short overlapping segments of the condition specific time-series from each voxel. Each segment was 64 data points in length, mean-centered and windowed with a 64-point Hanning window; the segments were overlapping by 32 data points. As compared to calculating the power spectrum with a single discrete Fourier transform of the entire time-series, averaging the spectra over several shorter segments decreases the variance of the power spectral estimate (Welch’s method). Similarly, the cross-spectrum was estimated by averaging the cross-spectra of shorter segments, where the cross-spectra was calculated multiplying the discrete Fourier transform of short segments of the condition specific time-series of the reference region (rFFA) with the complex conjugate of the discrete Fourier transform of the condition specific time-series of all other voxels in the brain.
We then generated coherence maps for the seed ROI for each condition using the estimate of the band-averaged coherence within the bandwidth of the hemodynamic response function (0–0.15 Hz).
Contrasts of condition-specific coherence maps
To identify changes in functional connectivity across conditions, we contrasted the initially learned task and well-practiced task coherence maps. We applied an arc-hyperbolic tangent transform to the coherency, as described in (Rosenberg et al., 1989 ), so that the difference of the coherency magnitudes approaches a zero-centered normal distribution. This transformation allows us to apply a parametric random-effects group analysis (a two-tailed, one-sample t-test) on the difference maps to determine regions with significantly different connectivity with the seed ROI across conditions. These difference maps were then normalized and submitted to a second-level one-sample t-test, in which the mean estimate across participants at each voxel was tested against zero. Identical to the univariate analysis, we first performed an exploratory analysis of the initially learned vs. well-practiced contrast using an uncorrected two-tailed threshold of p < 0.001 and a minimum cluster size of at least five contiguous voxels. To determine the significance of regions in this exploratory analysis, we applied a whole-brain family-wise correction (p < 0.05). Additionally, to determine the significance a priori regions, we performed a small volume correction (using family-wise error rate of p < 0.05) for each region using MNI anatomical regions of interest. This included left and right hippocampi, parahippocampal gyri, middle frontal gyri, inferior frontal gyri, precentral gyri, fusiform gyri, inferior temporal gyri, caudate, and putamen.
Subjects showed consistent improvement throughout categorization training in both reaction time and accuracy (Figure 3 ). At the end of training, subjects were highly accurate at the task (M = 96%, SD = 2.3) and obtained mean reaction times of less than a second (M = 950 ms, SD = 120). To determine if introduction of a new template face at the beginning of each day affected performance, we compared the first blocks from days 2–6 with the last blocks from days 1–5 and found no significant difference [t(9) = 0.92, p > 0.41; first blocks M = 89%, M = 1156 ms; last blocks M = 87%, M = 1232 ms]. This suggests that subjects learned a general strategy and did not simply memorize the specific faces each day.
Figure 3. Behavioral results. (A) Group accuracy and reaction time (N = 10) during the categorization task throughout 6 days of training. (B) Group accuracy and reaction time across blocks during the scanning session for the well-practiced and initially learned categorization tasks.
On day 7, immediately preceding the scanning session, subjects were able to successfully learn the new task within 100 trials (M = 84%, SD = 4.2; M = 1380 ms, SD = 171). This performance was significantly faster and more accurate than the initial training session of the learned task [ACC t(9) = 2.92, p < 0.05, RT t(9) = 3.35, p < 0.05; learned task M = 77%, SD = 4.8; M = 1602 ms, SD = 193], demonstrating that some skill or general task strategy was transferred from the practiced task to the new task.
Comparing the well-practiced and initially learned tasks across all blocks demonstrates that subjects were adept at executing the well-practiced (M = 94%, SD = 2.8; M = 1202 ms, SD = 193) than the initially learned tasks (M = 86%, SD = 3.8; M = 1472 ms, SD = 211). As expected, performance was significantly faster [t(9) = 2.92, p < 0.05] and more accurate [t(9) = 3.35, p < 0.05] when performing the learned task. The behavioral results also suggest that subjects did not learn or drastically change their strategy during the fMRI session, as there was no significant difference between scans 1, 2, and 3 (see Figure 3 ). Reaction times for the well-practiced task were slower during scanning than the final training block [t(9) = 4.95, p < 0.05], most likely because during training the task is self-paced and in the scanner the pace is slower and fixed. After the scanning session, subjects were asked if they were aware of the four novel face configurations introduced in the scanning session. None of the subjects reported noticing the novel configurations during scanning, though overall subjects performed nearly perfect at classifying these new faces (M = 99%, SD = 0.5). This suggests that subjects were learning a general rule rather than memorizing exemplars.
Figure 4 shows the group-averaged univariate parameter estimate maps (positive only) and maps of partial coherence with the right FFA for the categorization task. The univariate and partial coherence analyses show very similar results. Both analyses implicate early visual regions, inferotemporal cortex, medial temporal lobe, thalamus, inferior parietal lobe, lateral prefrontal cortex, anterior cingulate cortex, supplementary motor area, premotor, and motor regions. However, there was positive partial coherence in several brain areas such as the anterior prefrontal cortex, posterior cingulate/precuneus, and temporal/parietal regions that were not activated in the univariate maps.
Figure 4. Task-related activity. (A) Group-averaged unthresholded map comparing univariate activity during the initially learned and well-practiced task conditions compared to fixation. Regions more active during fixation are not shown. (B) Group-averaged unthresholded partial coherence maps using the right fusiform face area (R FFA) as a seed for the initially learned and well-practiced task conditions. L Fus. = left fusiform, R FFA = right fusiform face area, MTL = medial temporal lobe, Inf. Occ/Temp = inferior occipital/temporal lobe, DLPFC = dorsolateral prefrontal cortex, ACC = anterior cingulated, PMC = premotor cortex, SupraMarg. = supramarginal gyrus, Sup. Occ. = superior occipital gyrus, SMA = supplementary motor area, Inf. Par. = inferior parietal.
Initially learned vs. well-practiced categorization performance – univariate
Although similar networks were involved in both initially learned and well-practiced face categorization, there were several regions that differed in activity during execution of the initially learned versus well-practiced tasks (see Table 1 and Figure 5 ). However, after application of more strict corrections including the whole-brain family-wise error rate (p < 0.05) and small volume corrections, no regions were significantly different between the two conditions. Though there was a slight trend for the left FFA to be more active during the learned task [t(8) = 1.92, p = 0.22], neither the right nor left FFA showed significant differences in activation between the two tasks (Figure 6 ).
Figure 5. Group activity and partial coherence differences between categorization with initially learned versus well-practiced tasks. (A) Statistical parametric t maps contrasting activity in the initially learned task and well-practiced task blocks. (B) Statistical parametric t maps contrasting partial coherence with the right FFA in the initially learned task and well-practiced task blocks. T maps are overlaid on a standard T1-weighted anatomical image. VLPFC – ventrolateral prefrontal cortex, PMC – premotor cortex, SMG – supramarginal gyrus, IPL – inferior parietal lobe, STG – superior temporal gyrus, hipp – hippocampus.
Figure 6. Univariate parameter estimates (A) and right FFA partial coherence values (B) for regions that showed significant differences between in the initially learned and well-practiced tasks, indicated by **.
Initially learned vs. well-practiced categorization performance – partial coherence
The partial coherence analysis revealed a general increase in connectivity with practice: nearly all regions that showed significant coherence changes were more coherent with the right FFA during categorization with the well-learned (see Table 1 and Figure 5 ). Additionally, the functionally defined left FFA was significantly more coherent with the right FFA during the well-practiced task compared to the initially learned task [t(8) = 3.22, p < 0.05]. A small but significant focus in the left anterior hippocampus (abutting the amygdala and parahippocampal gyrus) demonstrated more partial coherence with the right FFA during the learned task. Lastly, a right premotor region, corresponding to area 6 in the precentral gyrus, was significantly more coherent with rFFA during the well-learned task. Univariate and partial coherence values from these regions are displayed in Figure 6 .
Increased signal-to-noise ratio (SNR) could have produced both increased univariate activity as well as increased partial coherence during the practiced. However, there was positive coherence in several brain areas such as the anterior prefrontal cortex, posterior cingulate/precuneus, and temporal/parietal regions that were not activated in the univariate maps, suggesting that partial coherence is a slightly different measure. Additionally, several regions show univariate changes in activity (see Table 1 ) but no partial coherence changes and vice versa. An example that suggests that partial coherence shows categorization-specific results is the difference between the univariate and partial coherence analyses in the initially learned vs. well-practiced contrast. At a lower threshold (p < 0.001 uncorrected), a left PMC region approximately 10 mm ventral to Rolando’s genu area (the location of hand primary motor cortex, Herve et al., 2005 ) was more active for the initially learned task than the well-practiced task. This is likely due to the lengthened reaction time/duty cycle for the initially learned task and not specific to visual categorization. In contrast, a premotor region approximately 20 mm more anterior (outside the hand area) was implicated in the coherence analysis, showing greater partial coherence with the right FFA during the well-practiced task. This is consistent with studies showing greater premotor involvement during the execution of a stereotyped response, and suggests this region is specific to increased proficiency with the categorization task. Together, this suggests that univariate and partial coherence measure slightly different aspects of brain function and that partial coherence measures aspects that may be more specific to categorization.
The current results demonstrate that well-practiced visual categorization utilizes a similar network of brain regions as newly learned categorization rather than recruiting an alternative network. The regions implicated in the task-related univariate analysis were highly overlapping with the regions implicated in the partial coherence analysis, suggesting that regions active during categorization are also functionally connected with the right fusiform face area (FFA). The results also demonstrate an overall training-related increase in functional connectivity between task-related regions with learning. This suggests that visual category expertise occurs through strengthening connections between task-related regions. Though the network activity and connectivity remained largely the same for the initially learned and well-practiced task, there were also notable regional differences. Partial coherence analyses revealed increases in functional connectivity between task-related regions and the right FFA with practice. Specifically, the left FFA was more coherent with the right FFA during the learned task, suggesting a greater coordination between visual regions as a mechanism of visual category learning. Additionally, left MTL and right PMC showed greater coordination with the right FFA during the learned task, suggesting their involvement in the retrieval and representation of well-learned categories.
Overall Networks Involved Early and Late in Categorization Training
The results demonstrate that a similar distributed network of regions supports well-practiced visual categorization, rather than distinct networks. This suggests that with visual category learning there are relatively small changes within the same network rather than a shift to a new network (Kelly and Garavan, 2005 ). This also suggests that similar cognitive and neural strategies were used for both tasks and throughout learning subjects became more efficient at utilizing the same strategy rather than developing a new strategy (Jonides, 2004 ). These results are consistent with recent fMRI studies of category prototype learning (Little and Thulborn, 2005 , 2006 ; Little et al., 2006 ). In these tasks, subjects explicitly learned to classify dot-patterns into one of four categories based on their similarity to category prototype patterns. Little and colleagues found that there was no significant change in either the distribution or magnitude of the BOLD signal between initial categorization (without any training) and well-practiced categorization (after either 750 or 2150 trials of training) (Little and Thulborn, 2005 , 2006 ; Little et al., 2006 ).
Though the current category learning task used faces instead of dot-patterns, had fewer categories and fewer exemplars, and more training, the current results are consistent with Little et al.’s findings. Our results further demonstrate that, in addition to eliciting a similar distributed activation pattern, the functional connectivity between these distributed regions is similar throughout learning. Together, this suggests that visual categorization is accomplished by a similar functionally connected network rather than the recruitment of a new network. Studies with additional training will be useful in further delineating the time course of the involvement of this network. Additionally, exploring if the current results generalize to other learning strategies/tasks and fMRI designs would be useful. Because the current task assessed initially learned and well-practiced performance in one fMRI session, there may have been some transfer in the learning strategy between the tasks, biasing the results to show the same network at different time points in learning. Taking separate scans early and late in learning would useful to compare to the current results. Also, the current study provided feedback and small monetary rewards to motivate rapid learning. However, categories can be learned incidentally in the absence of any reward and may rely on mechanisms distinct from reward-based category learning (Reber et al., 2003 ). Future studies would be helpful to determine if the recruitment of a consistent network throughout learning applies to these other forms of category learning.
In addition to the recruitment of a consistent network, it is also notable that there was an overall increase in partial coherence and relatively few decreases with learning (see Figure 5 and Table 1 ). This suggests that stronger coordination between task-related brain regions may be a general mechanism underlying improvements in performance. Recent studies of impaired populations support this idea (Bokde et al., 2006 ; DeGutis et al., 2007 ; He et al., 2007a ,b ). Bokde et al. (2006) compared healthy controls to mild cognitive impairment (MCI) patients during a face matching task and found overall greater correlations between the right fusiform gyrus and task-related regions in healthy controls compared to MCI patients. DeGutis et al. (2007) demonstrated that improvements in face processing in a prosopagnosic following rehabilitation training correspond with widespread increases in functional connectivity with the right FFA and very few decreases. These studies suggest that better performance is related to greater functional connectivity. However, this effect may be specific to complex tasks that require coordination among a broad network of regions. Schwartz et al. (2002) extensively trained subjects on visual texture discrimination and found that learning is specific to the trained eye and that there is a decrease in functional connectivity between visual and frontal regions after training. Future studies varying task complexity and amount of training are necessary to better characterize the timing and task constraints that produce connectivity changes. Additionally, the use of repetitive transcranial magnetic stimulation (rTMS) to create virtual lesions in regions that significantly increase their coherence would be useful to assess the behavioral relevance of functional connectivity changes.
Though the results demonstrate an overall similar functionally connected network being employed early and late in category learning, there were a few notable increases in regional connectivity with learning in ITC, MTL, and PMC, which are described below.
Increased Connectivity between Face-Selective Regions
Our results suggest that increased coordination between the left and right FFA support improvements in visual processing with category learning. Univariate ROI analysis of right and left FFA showed no significant difference in the magnitude of activity between initially learned and well-practiced task. This is in contrast to studies demonstrating that perceptual training generally increases activity in regions that represent or process the trained stimuli (Op de Beeck et al., 2006 ; Jiang et al., 2007 ). However, the results demonstrate significantly greater partial coherence between the right and left FFA during the well-learned task, suggesting increased coordination between the processes performed by or representations in these regions. In the right FFA, these processes or representations are likely related to computing specific spatial relations between facial features and those involved in integrating feature identities and spacings into a holistic percept (Yovel and Kanwisher, 2004 ). In the left FFA, these processes or representations are likely related to more parts-based analysis of faces, as this region has shown more activity when matching face parts compared to whole faces (Rossion et al., 2000 ). Thus, our findings suggest that training increased the coordination between holistic processing/representations in the right FFA and parts-based processing/representations in the left FFA.
This learning-related increased in coordination between ITC regions may reflect that well-established categories are represented by functional connections between ITC regions. Several studies have demonstrated reliable distributed patterns of both supra- and sub-threshold voxels throughout ITC when viewing a variety of well-established categories such as chairs, shoes, and scissors (Ishai et al., 1999 ; Haxby et al., 2001 ; Hanson et al., 2004 ; O’Toole et al., 2005 ). Additionally, Op de Beeck et al. (2006) showed that visual training produces distinct distributed activation patterns to trained stimuli not predicted by pre-training activations. This suggests that distributed activations are integral to category representations as well as learning new categories. A recent study suggests that these distributed activations are also functionally connected (Moeller et al., 2008 ). Using simultaneous microstimulation and fMRI in macaques, Moeller and colleagues stimulated face- and object-selective patches while measuring fMRI activity in other face and object patches. They found that microstimulation activated, to varying degrees, distinct networks of ipsi- and contralateral patches, demonstrating that these regions are functionally connected. The current results add to this finding by suggesting that as categories become more established, their functional connections in ITC strengthen and further suggest that functional connectivity changes may precede activity changes during visual learning. Future studies with additional training and other object categories would be useful to further characterize these learning-related changes in ITC.
High-Level Feature Binding in Left Medial Temporal Lobe
Similar to the left FFA, left medial temporal lobe (MTL) partial coherence increased with learning, likely due to increased proficiency in individuating the face stimuli. This finding is consistent with recent studies demonstrating that MTL regions are more active when making judgments about well-learned compared to poorly learned information (Yanike et al., 2004 ; DeGutis and D’Esposito, 2007 ). This is also consistent with studies showing that MTL regions are important for visual perception, in particular making fine-grained object discriminations that rely on conjunctions of features (Bussey et al., 2002 ). These fine-grained discrimination mechanisms are used more when making more specific judgments about objects and faces (such as “zebra” instead of “living thing”). Correspondingly, left MTL has shown to be more active when making more specific compared to less specific object categorizations (Tyler et al., 2004 ) and is recruited during face individuation (Furl et al., 2007 ). Additionally, left MTL is more active in car and bird experts while viewing their stimuli of expertise compared to novices (Gauthier et al., 2000 ). Together, this suggests that the left medial temporal lobe may be involved in binding object features when making subordinate level category judgments and is increasingly involved with visual category learning. Future studies with additional types of visual categories will be important to determine if the involvement of the medial temporal lobe is specific to subordinate level expertise. Also, studies with higher resolution fMRI could determine the specific contributions of subregions of the medial temporal lobe (e.g. hippocampus, parahippocampus, or perirhinal cortex) to these effects.
This role of the left MTL in individuation and feature binding may be particular to categories that are initially learned in an explicit manner. Nomura et al. (2007) compared explicit and implicit category learning and found increased left hippocampus (HC) during explicit category learning. Also, Reber et al. (2003) showed that left HC is more involved in explicit rather than implicit category retrieval. Furthermore, DeGutis and D’Esposito (2007) found that left HC responded more when retrieving explicitly learned exemplars farther rather than closer to the category boundary. This HC category boundary effect was present even while performing a perceptual task with the same stimuli where subjects were not instructed to explicitly categorize the stimuli, suggesting the involvement of the left HC is relatively automatic. The current study extends these findings by suggesting that the left MTL is automatically recruited when successfully retrieving stimuli throughout extended learning and its involvement is likely particular to explicitly learned categories.
It is notable that, in contrast to the MTL, the basal ganglia (BG) did not show significant activity or connectivity changes with learning. Previous reports show that basal ganglia is integral to the initial stages of both implicit and explicit category learning (Seger, 2008 ) and is also recruited for categorization judgments after extended training (DeGutis et al., 2007 ). We previously demonstrated, after extensive category training on a similar task as the current study, that the BG was more responsive to faces close-to as opposed to far-from the category boundary (DeGutis et al., 2007 ). Unfortunately, this contrast was underpowered in the current design. It is possible that the BG distance-to-boundary effect is present both early and late in learning, resulting in no significant change in its involvement throughout learning in the current results. Alternatively, the BG may be less involved in the initially learned task because a fair amount of learning has already taken place in this task due to transfer from the well-practiced task, making initial learning less dependent on the BG.
Premotor Cortex, Retrieval, and Response Selection
In addition to ITC and left MTL, our results demonstrate that right PMC increased coherence with the right FFA during the well-practiced task, suggesting its involvement in the retrieval or representation of category responses. The right lateralization of this region suggests that it may not necessarily be related to subjects’ right-handed category response. However, recently learned motor skills have been shown to be supported by regions specific to the learned movement whereas long-term learning (∼4 h of training over 3 weeks) has shown to involve more of a bi-hemispheric network (Floyer-Lea and Matthews, 2005 ). Thus, this region could be part of the premotor network involved in executing the category response. The increased coordination of the right PMC with the right FFA through learning is consistent with the finding that PMC damage impairs the retrieval of previously learned responses to visual stimuli (Halsband and Passingham, 1982 ). Additionally, these results are in line with Wallis and Miller (2003) demonstration that responses of PMC neurons are selective to well-learned rules and that rule-selective activity in PMC precedes PFC and BG (Muhammad et al., 2006 ). These results also fit well with the late stage learning predictions of a recent model of categorization automaticity by Ashby et al. (2007) (though not the early stage prediction of basal ganglia involvement) in which extended procedural category learning leads to strengthened cortical-cortical connections from sensory association areas directly to premotor cortex, in this case from right FFA to right PMC.
Explicit categorization of recently learned visual categories is accomplished by a dynamic interaction of inferotemporal cortex, medial temporal lobe, prefrontal, premotor, and motor cortices. Both initially learned and well-practiced categorization recruits this network of regions, rather than the recruitment of a distinct network with practice. With practice, the connectivity between the right FFA and this network is strengthened. In particular, visual analysis is more efficiently accomplished perhaps due to increased connectivity between ITC regions. Additionally, subjects improve at individuating stimuli and retrieving categories likely through increased ITC and MTL connectivity. Finally, right premotor cortex shows increased connectivity with ITC, likely related to increasing efficiency in selecting the appropriate category response.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We would like to thank Dr. Felice Sun for developing the coherence analysis and for help with applying this analysis to the current study. We would like to thank Dr. Shawn Ell and Dr. Charlotte Boettiger for feedback and useful comments. This work was supported by a grant from the National Institute of Health.