An Associational Model of Birdsong Sensorimotor Learning I. Efference Copy and the Learning of Song Syllables

Todd W. Troyer1,3 and Allison J. Doupe1,2,3,4

 1Department of Psychiatry,  2Department of Physiology,  3W. M. Keck Center for Integrative Neuroscience, and  4Sloan Center for Theoretical Neurobiology at UCSF, University of California, San Francisco, California 94143-0444


    ABSTRACT
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Troyer, Todd W. and Allison J. Doupe. An Associational Model of Birdsong Sensorimotor Learning I. Efference Copy and the Learning of Song Syllables. J. Neurophysiol. 84: 1204-1223, 2000. Birdsong learning provides an ideal model system for studying temporally complex motor behavior. Guided by the well-characterized functional anatomy of the song system, we have constructed a computational model of the sensorimotor phase of song learning. Our model uses simple Hebbian and reinforcement learning rules and demonstrates the plausibility of a detailed set of hypotheses concerning sensory-motor interactions during song learning. The model focuses on the motor nuclei HVc and robust nucleus of the archistriatum (RA) of zebra finches and incorporates the long-standing hypothesis that a series of song nuclei, the Anterior Forebrain Pathway (AFP), plays an important role in comparing the bird's own vocalizations with a previously memorized song, or "template." This "AFP comparison hypothesis" is challenged by the significant delay that would be experienced by presumptive auditory feedback signals processed in the AFP. We propose that the AFP does not directly evaluate auditory feedback, but instead, receives an internally generated prediction of the feedback signal corresponding to each vocal gesture, or song "syllable." This prediction, or "efference copy," is learned in HVc by associating premotor activity in RA-projecting HVc neurons with the resulting auditory feedback registered within AFP-projecting HVc neurons. We also demonstrate how negative feedback "adaptation" can be used to separate sensory and motor signals within HVc. The model predicts that motor signals recorded in the AFP during singing carry sensory information and that the primary role for auditory feedback during song learning is to maintain an accurate efference copy. The simplicity of the model suggests that associational efference copy learning may be a common strategy for overcoming feedback delay during sensorimotor learning.


    INTRODUCTION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

The combination of a well-characterized, stereotyped behavior and specialized anatomy makes birdsong an ideal system in which to study the neural basis of motor learning. Moreover, song learning shares important similarities with human speech learning (Doupe and Kuhl 1999). In birds, vocal learning is accomplished in two phases. During an initial, sensory phase, birds listen to and memorize a tutor song, often called the "template" (Konishi 1965; Marler 1964). In a later, sensorimotor phase, birds gradually match their vocalizations to the memorized song, using auditory feedback from their own vocalizations (Fig. 1, A and B). We have constructed a computational model demonstrating that simple associational (Hebbian) learning rules are sufficient to address important problems related to the sensorimotor learning of song. Our model focuses on the zebra finch, a species commonly used in physiological investigations of song learning. Zebra finch song consists of a stereotyped sequence of vocal gestures or "syllables." In this paper, we focus on the learning of the individual syllables. In the following companion paper (Troyer and Doupe 2000), we extend our model to include sequence learning.



View larger version (27K):
[in this window]
[in a new window]
 
Fig. 1. The song system. A: developmental time course. During sensory learning, birds memorize a song from their tutor. Our model assumes that this process has already been completed. During sensorimotor learning, birds use auditory feedback from their own vocalizations to match their song to the memorized template. These stages of learning may overlap. After learning, song "crystallizes," becoming more stable and less dependent on auditory feedback. B: behavioral schematic of sensorimotor learning (cf. Konishi 1965). C: song system anatomy: Anterior Forebrain Pathway (AFP) (gray); motor pathway (white). Field L (black) receives input from auditory thalamus and provides direct and/or indirect auditory input to HVc (Fortune and Margoliash 1995; Janata and Margoliash 1999; Vates et al. 1996). D: schematic of the "AFP comparison hypothesis." Note that the 100-ms estimated latency (see Model and approach) for motor signals to leave robust nucleus of the archistriatum (RA), return as auditory feedback via L, and then be processed in the AFP is nearly as long a typical song syllable. Thus, the evaluation of auditory feedback from one syllable would arrive in RA during the motor activity for the subsequent syllable.

The likely neural substrate for sensorimotor learning is the song system, a set of brain nuclei specialized for vocal learning and production (Nottebohm et al. 1976) (Fig. 1C). The motor pathway for song includes the direct projection from nucleus HVc (used as a proper name; Margoliash et al. 1994) to the robust nucleus of the archistriatum (RA). Both nuclei display neural activity time-locked to song production (McCasland 1987; Yu and Margoliash 1996), and lesions in either nucleus disrupt normal song production at all stages of development (Nottebohm et al. 1976; Simpson and Vicario 1990). HVc and RA are also connected by an indirect pathway, the Anterior Forebrain Pathway (AFP). Lesion studies indicate that the AFP is crucial for song learning, but is not necessary for normal song production in adults (Bottjer et al. 1984; Scharff and Nottebohm 1991; Sohrabji et al. 1990). These and other data (see Biologically supported assumptions) have led to the "AFP comparison hypothesis," in which the AFP guides sensorimotor learning by transmitting a comparison between auditory feedback from the bird's own vocalizations and the memorized template (Bottjer and Arnold 1986; Doupe 1993; Mooney 1992; Nordeen and Nordeen 1988; Saito and Maekawa 1993). These comparison signals are used to guide learning in the motor pathway at the level of RA (Fig. 1, C and D).

The AFP comparison hypothesis is challenged by a fundamental problem in motor learning, the problem of feedback delay (Lashley 1951; Miall and Wolpert 1996; Miles and Evarts 1979). In zebra finches, the 100-ms estimated latency (see Fig. 2) for presumptive AFP comparison signals to arrive in the motor pathway after a motor command is nearly as long as a typical song syllable. This delay would cause comparison signals for one syllable to have greatest overlap with the neural activity for the subsequent syllable and poses a significant challenge to the notion that AFP comparison signals guide learning in RA (see Bottjer and Arnold 1986). In our model, we retain the hypothesis that the AFP plays an important role in template comparison but propose that instead of waiting for the actual auditory feedback, an internal prediction or "efference copy" of the auditory feedback is generated within HVc to guide song learning. Therefore, we predict that the signals recorded in the AFP during singing (Hessler and Doupe 1999a,b) are motor signals that also carry sensory information. Furthermore, our model suggests a functional reason for why the AFP is located downstream of the motor nucleus HVc (Fig. 1, C and D): use of an efference copy requires that brain areas involved in template comparison receive motor efferents.

Preliminary versions of this work have been presented in conference proceedings (Troyer et al. 1996a,b).

Model and approach

Over the past 25 years, anatomical, lesion, and in vivo physiology studies have yielded a wealth of data concerning the functional anatomy of the song system. However, current hypotheses regarding the sensory-motor interactions during song learning lack detail. To explore these issues, we set out to build a computational model of the sensorimotor phase of song learning. Our goal was to determine if basic theoretical problems in sensorimotor learning could be solved using simple rules of associational plasticity, constrained by the known anatomy of the song circuit. We hoped to direct future experiments by identifying important gaps in our knowledge, as well as to evaluate previous experimental results from a computational point of view.

Our efforts resulted in two closely related models, addressing the problem of song learning at different levels of abstraction. The first model is a purely "conceptual model," i.e., a self-consistent set of functional hypotheses conforming to a wide range of experimental results. The functional hypotheses contained in this model constitute the core contribution of our research. The second model is a true "computational model" that incorporates these hypotheses into a working computer algorithm. Due to the very limited knowledge of the song system at the level of local circuits, implementing this algorithm required a number of specific assumptions that reach beyond current experimental knowledge. As a result, several aspects of the computational model are not well-constrained by biology. Moreover, we made a number of simplifying assumptions to ensure that simulations could be run in a reasonable amount of time. However, the computational model played an important role in exploring our initial functional ideas and serves to illustrate our core conceptual hypotheses. Perhaps more importantly, the construction of a working computational algorithm demonstrates the mutual consistency of our hypotheses, as well as providing a theoretical demonstration that they are sufficient to account for important aspects of song learning. This dual approach not only highlights general problems of sensorimotor learning and generates testable predictions at a functional level, it also provides a framework for understanding how specific biological mechanisms may contribute to their solution. These models are only a first step, and, of necessity, contain many simplifications. However, taken together, they constitute the most detailed set of hypotheses to date regarding the interaction of sensory and motor signals during the sensorimotor phase of song learning.

In this section, we present the justification for our working biological assumptions. We then describe the main problems addressed by our model and outline the key elements of our proposed solution. Finally, we present our conceptual model, which describes our functional hypotheses in greater detail. In the METHODS section, we outline the theoretical assumptions incorporated into our working computational model, including a description of the network architecture and the simple encoding scheme used to represent song. In the RESULTS section, we present quantitative results generated by our computational model. Details of the computer algorithm are confined to an APPENDIX.

Biologically supported assumptions

Although the nature of template memorization is largely unknown, various lines of evidence suggest that the AFP may transmit a comparison between the bird's own vocalizations and the memorized tutor song. We call such signals "template comparison signals." Initial evidence suggesting a role for the AFP in template comparison came from lesion experiments: AFP lesions in juvenile zebra finches disrupt song learning, whereas lesions in adult birds have little effect on normal song production (Bottjer et al. 1984; Nottebohm et al. 1976; Scharff and Nottebohm 1991; Sohrabji et al. 1990). Further experiments have shown that the lateral portion of the magnocellular nucleus of the anterior neostriatum (LMAN), the output nucleus of the AFP, appears to be necessary any time the song changes, even in adulthood (Brainard and Doupe 2000; Morrison and Nottebohm 1993; Williams and Mehta 1999). Other experiments suggest that circuitry within the AFP may function as a template: AFP neurons develop song selective auditory responses during song learning (Doupe 1997; Solis and Doupe 1997), and a subset of these neurons respond vigorously to the tutor song (Solis and Doupe 1997, 1999). Using a more direct approach, Basham et al. (1996) showed that local blockade of N-methyl-D-aspartate (NMDA) receptors in the AFP specifically during song memorization disrupts normal song learning.

Within the framework of our model, the simplest hypothesis is that the AFP not only transmits a template comparison signal, but that it also computes the match between the efference copy and the memorized template, i.e., the AFP is the storage site for the tutor template. We did not attempt to model the AFP circuitry that subserves template comparison but rather viewed the AFP as a "black box" that performs the necessary calculations. An alternative hypothesis that is still consistent with the basic structure of our model is that the AFP transmits a template comparison signal, but that memorized template information is stored closer to the auditory periphery than the AFP (see DISCUSSION).

Additional studies into the functional anatomy of the song system have shown that the neurons that project to RA and those that project to the AFP form distinct populations within HVc (Nordeen and Nordeen 1988). We denote these two populations HVc_RA and HVc_AFP. While the evidence is indirect, these two populations are likely to be highly interconnected (Fortune and Margoliash 1995; Vu and Lewicki 1994). Various data suggest that activity within HVc_RA neurons is more closely tied to motor behavior, whereas activity within HVc_AFP neurons is more closely tied to auditory input (Katz and Gurney 1981; Kimpo and Doupe 1997; Lewicki 1996; Saito and Maekawa 1993; but see Doupe and Konishi 1991; Vicario and Yohay 1993). Moreover, experiments in singing birds suggest that the motor pathway is arranged hierarchically, with RA encoding the detailed motor program for each song syllable, and the central pattern generator for song sequence lying upstream of RA, perhaps in HVc (Vu et al. 1994; Yu and Margoliash 1996).

The main biologically supported assumptions that are incorporated into the model are summarized in Table 1.


                              
View this table:
[in this window]
[in a new window]
 
Table 1. Biologically supported assumptions

The final data included in the model were the estimated latencies between various song nuclei (Fig. 2A). We included only the best studied neural pathways in the song system, as the functional significance of other signaling pathways remains unclear (see Foster and Bottjer 1998; Foster et al. 1997; Striedter and Vu 1998; Vates et al. 1997). We used 50 ms for the latency from HVc premotor activity to vocal output (McCasland 1987; McCasland and Konishi 1981), and 15 ms for auditory latencies to HVc (Margoliash and Fortune 1992). Estimating the processing time through the AFP during song was more problematic, since activity in LMAN, the output nucleus of this pathway, is quite variable. We used 45 ms for the latency to LMAN (A. J. Doupe 1997; personal observations). Subtracting 15 ms for the latency to HVc and adding 10 ms for the delay between LMAN and RA, we obtained a processing time through the AFP of roughly 40 ms. Simulated syllables were 80-ms long with a 35-ms gap between syllables (Fig. 2B), typical of mean values for zebra finch song (M. Brainard, personal communication; Scharff and Nottebohm 1991; Zann 1993). These timing data suggest that, on average, presumptive template comparison signals from the AFP will have the greatest overlap with motor activity for the subsequent syllable (Fig. 2C, dotted box).



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 2. Timing within the song system. A: numbers represent estimated latencies between song nuclei (see RESULTS); 40 ms represents the entire processing time for signals passing through the AFP. B: the length of model syllables (M. Brainard, personal communication; Scharff and Nottebohm 1991; Zann 1993). C: time delay (100 ms) for motor activity to return to RA via auditory feedback (45 + 15 ms) and the AFP (40 ms). A signal transmitted by the AFP that carries the match between the syllable just sung and the memorized template will arrive in RA during the motor activity for the next syllable (dotted box).

Problems addressed

In this paper, we address the problem of learning a collection of motor representations corresponding to song syllables stored within a memorized template. For simplicity, we do not address learning the detailed temporal structure within each syllable, nor learning the length of syllables and inter-syllable gaps. Our model rests on two key assumptions: 1) song learning is accomplished using simple associational learning rules and 2) the AFP guides song learning by transmitting a signal that carries information about the match between the bird's auditory feedback and a stored template. Here, we present a brief outline of the main problems addressed by our model and the key functional hypotheses that underlie our solutions (see Table 2). More detail regarding our hypothesized solutions is presented in the form of a conceptual model (see Conceptual model) and a computational model (see RESULTS). The presentation of both models is structured according to the following outline.


                              
View this table:
[in this window]
[in a new window]
 
Table 2. Functional hypotheses for syllable learning

The first problem we address is the important problem of auditory feedback delay: presumptive AFP comparison signals would arrive in RA during the neural activity for the next syllable (Fig. 2C). We hypothesize that the AFP does not directly evaluate auditory feedback, but instead, receives an internally generated prediction of the sensory feedback resulting from song-related motor activity (Table 2, number 1). Such an internal prediction requires a transformation from motor to sensory coordinates and has been termed efference copy (Sperry 1950), "corollary discharge" (von Holst and Mittelstaedt 1980), or the result of a "forward model" (reviewed in Jordan 1995; Miall and Wolpert 1996). We will use the term efference copy. Sensory signals resulting from motor behavior have been termed sensory "reafference" (von Holst and Mittelstaedt 1980). We further hypothesize that the motor right-arrow sensory efference copy develops between the two populations of HVc projection neurons (Table 2, number 2). To learn this mapping, it is important that our associational plasticity rule is "temporally asymmetric," i.e., presynaptic activity must be followed by postsynaptic activity to induce plasticity (Table 2, number 3).

The second problem we address is the nature of AFP-guided syllable learning in RA. We make two functional hypotheses. First, we hypothesize that syllable learning is guided by nonspecific reinforcement signals provided by the AFP that modulate the degree of ongoing associational plasticity throughout RA (Table 2, number 4; see Sutton and Barto 1998, for an overview of reinforcement learning). This hypothesis is motivated by the fact that nonspecific reinforcement signals, while generated by a match to a sensory template, do not have to be directed toward specific patterns of RA motor neurons. As a result, no sensory right-arrow motor mapping is required to guide learning. Second, we hypothesize that synapses intrinsic to RA play an important role in storing syllable representations (Table 2, number 5). This hypothesis was motivated by the need to learn a number of discrete patterns of neural activity corresponding to the syllables in the tutor template and is consistent with estimates that up to 85% of synapses in RA come from local collaterals of other RA neurons (Herrmann and Arnold 1991). Theoretical models have shown that recurrent activity is ideal for stabilizing such patterns (e.g., Hopfield 1984). Moreover, if the representation for individual syllables is encoded in the pattern of intrinsic RA synapses, plasticity in the synapses connecting HVc and RA can alter the sequence of syllables produced, with only minor disruption to the representation for each individual syllable (see Troyer and Doupe 2000).

The third problem we address results from the competing requirements of both learning and using the efference copy signal. Learning an efference copy mapping by associating motor activity with delayed auditory feedback implies that auditory inputs induce significant levels of activity. However, when using the short-latency efference copy signals to guide syllable learning, the strong auditory inputs will interfere with the efference copy signal. We address this problem by assuming that the auditory feedback signal is relatively weak and/or that the response of HVc_AFP neurons is strongly adapting (Table 2, number 6).

Conceptual model

Our model focuses on four neural populations (Fig. 3): nucleus RA in the motor pathway, separate populations of HVc projection neurons projecting to RA and the AFP (Nordeen and Nordeen 1988), and a single population representing the output of the AFP. Because we do not explicitly model nuclei downstream of RA, activity in RA represents the motor output of the model. In this paper, we explore the functional consequences of associational plasticity in three sets of connections: HVc_RA right-arrow HVc_AFP, HVc_RA right-arrow RA, and intrinsic RA right-arrow RA connections.



View larger version (31K):
[in this window]
[in a new window]
 
Fig. 3. Network architecture. Black arrows: plastic connections. Gray arrows: nonplastic connections. AFP right-arrow RA connections transmit a reinforcement signal that modulates plasticity in RA but does not affect RA activity patterns. Plastic connections from HVc_AFP right-arrow HVc_RA and from AFP right-arrow RA (not shown) are considered in the following companion paper (Troyer and Doupe 2000).

Our model does not address the learning of syllable timing. We assume that timing is provided by rhythmically clocked bursts of premotor activity arriving in HVc_RA, with the duration of each burst controlling the duration of premotor activity and hence the length of song syllables (Fig. 2B). While the source of the premotor drive is not explicitly modeled, the song nuclei nucleus uvaeformis (Uva) and/or nucleus interfacialis (NIf) are likely candidates (McCasland 1987; Striedter and Vu 1998; Williams and Vicario 1993). Input from the forebrain nucleus medial MAN is also a possible source. Although the timing of this drive is fixed, we assume that HVc_RA neurons receive varying magnitudes of drive, and these magnitudes are generated independently for each HVc_RA neuron and each vocalization produced by the model. Thus, HVc_RA produces random patterns of premotor activity that are independent from one syllable to the next. The model's task is to use template comparison signals generated by the AFP to reorganize the connections in the motor pathway so that 1) random HVc_RA activity is converted into a handful of stereotyped patterns of RA motor activity, and 2) these stereotyped patterns of RA activity lead to vocal output matched to the memorized template. Note that HVc_RA activity becomes ordered when we address the problem of sequence learning (Troyer and Doupe 2000).

PROBLEM 1: AUDITORY FEEDBACK DELAY. To address the problem of feedback delay, we hypothesize that an efference copy mapping is learned between the two populations of HVc projections neurons (Table 2, numbers 1 and 2). Since the connections in the motor pathway are initially unstructured, the random patterns of HVc_RA activity lead to a random exploration of motor space (cf. Bullock et al. 1993; Kuperstein 1988; Salinas and Abbott 1995). Activity flows down the motor pathway (McCasland 1987) and returns to HVc_AFP as auditory feedback (Fig. 4A, dark lines). While the exact form of the learning is not crucial for our model, it is important that associational learning is temporally asymmetric (Table 2, number 3), i.e., synaptic strengths increase only when presynaptic activity precedes postsynaptic activity (Bi and Poo 1998; Debanne et al. 1998; Gustafsson et al. 1987; Hebb 1949; Markram et al. 1997). By strengthening synapses onto neurons that are likely to fire in the near future, temporally asymmetric "Hebbian" learning strengthens synaptic inputs that "anticipate" any postsynaptic activity that regularly follows presynaptic spiking (cf. Blum and Abbott 1996; Gerstner and Abbott 1997). In our model, auditory feedback to HVc_AFP neurons encoding the sensory aspects of a particular vocal gesture will follow spiking in HVc_RA neurons encoding motor aspects of that gesture. Associational learning then strengthens the synapses from that (presynaptic) HVc_RA neuron onto the corresponding (postsynaptic) neurons in HVc_AFP (Fig. 4A, white arrow). After this motor right-arrow sensory mapping is learned, activity within HVc_RA motor neurons will drive, with short latency, the HVc_AFP neurons encoding the corresponding sensory representation. This short-latency motor activity in HVc_AFP constitutes a sensory prediction of the auditory reafference. This efference copy can then be passed on to the AFP and used to guide learning in RA. Note that efference copy learning occurs within HVc and proceeds without reference to the tutor template stored in the AFP. Using efference copy in this way splits the total feedback delay for AFP comparison signals to return to RA into two shorter delays: the auditory feedback delay of 65 ms to HVc (Fig. 4A) and the 40-ms processing delay from HVc through the AFP (Fig. 4B).



View larger version (45K):
[in this window]
[in a new window]
 
Fig. 4. Two-step solution to the problem of feedback delay. A: step 1: efference copy learning. Each syllable is initiated by a random premotor drive to HVc_RA. This signal travels through the motor and auditory feedback pathways (black arrows) arriving in HVc_AFP with a delay of 65 ms. Motor nuclei downstream of RA are not explicitly modeled. Associational learning (white arrow) between premotor HVc_RA activity and HVc_AFP activity driven by auditory feedback results in an efference copy mapping. B: step 2: learning syllable representations. The efference copy is passed on to the AFP, and the match with the stored template serves as a reinforcement signal (line with round end) that modulates plasticity signals in RA. This modulation reorganizes intrinsic connections within RA, as well as the projection from HVc (white arrows).

PROBLEM 2: SYLLABLE LEARNING IN RA. To guide syllable learning, the AFP evaluates the efference copy and transmits a reinforcement signal to RA (Table 2, number 4). This nonspecific reinforcement signal is assumed to modulate the degree of ongoing associational plasticity throughout RA. An efference copy that is well-matched to the tutor song results in a large plasticity signal in RA neurons that are significantly activated, leading to a potentiation of recently activated synapses; a poor match evokes small potentiation or depression. Since a good match to the tutor song occurs when the RA neurons that encode a single tutor syllable are co-active, reinforcement leads to the development of strong connections between RA neurons encoding the same tutor syllable (Table 2, number 5). Reinforcement also reorders the connections from HVc_RA right-arrow RA (see RESULTS). These patterns of connectivity result in a strong tendency for RA to produce coherent patterns of motor activity matched to the template, i.e., the tutor syllables have become "attractors" for the neural dynamics within RA (see, e.g., Amit 1989).

PROBLEM 3: SEPARATING MOTOR AND SENSORY SIGNALS IN HVC. In our model, HVc_AFP neurons receive two distinct inputs: auditory feedback, which drives efference copy learning, and motor input from HVc_RA, which carries the efference copy used for AFP-driven song learning. While necessary for efference copy learning, the delayed auditory signal can interfere with the efference copy signal used to guide learning. We propose two strategies for separating sensory and motor signals within HVc_AFP (Table 2, number 6). First, the auditory feedback signal is set significantly weaker than the efference copy signal. Hence, auditory feedback only weakly perturbs the efference copy, which can remain sufficiently accurate to guide syllable learning. However, weak auditory feedback is able to guide efference copy learning by providing, over the course of multiple syllables, a consistent association between HVc_RA motor activity and the resulting weak sensory activation. The second strategy is based on the cancellation of auditory feedback signals in HVc_AFP by "adaptation." Specifically, adaptation in the HVc_AFP circuitry results in a "negative after-image" of any given pattern of HVc_AFP activity (Fig. 5), which has a decay time (100 ms) similar to the length of a typical song syllable (see APPENDIX for implementation). A variety of biological mechanisms could provide this kind of adaptation, e.g., spike-triggered or voltage-dependent intrinsic currents and/or slow feedback inhibition. Such mechanisms have been shown to be present within HVc (Dutar et al. 1998; Kubota and Saito 1991; Kubota and Taniguchi 1998; Schmidt and Perkel 1998). Because the efference copy arrives in HVc_AFP with a shorter delay than the auditory feedback, the after-image of the efference copy will counteract the corresponding auditory reafference. That is, HVc_AFP neurons strongly activated by efference copy input from HVc_RA will be in an adapted state by the time that the corresponding patterns of delayed auditory feedback arrive in HVc_AFP. Note that an inaccurate efference copy will lead to an incomplete cancellation of auditory feedback, and interference from this delayed feedback will create an inaccurate efference copy. However, associations between the uncanceled feedback signal and the HVc_RA motor activity that gave rise to it will lead to new plasticity that improves the quality of future efference copy predictions. Details of how this cancellation mechanism works in the context of our computer algorithm are presented in the RESULTS.



View larger version (25K):
[in this window]
[in a new window]
 
Fig. 5. Separating efference copy and delayed auditory feedback. Given our estimates (Fig. 2A), auditory feedback will reach HVc_AFP 60 ms after the efference copy input from HVc_RA. Activity in HVc_AFP results from a mixture of these two signals (see Fig. 7 below). Adaptation mechanisms in HVc_AFP produce a delayed, negative image of HVc_AFP activity, which is subtracted from the auditory feedback. Accurate efference copy predictions can cancel auditory feedback; inaccurate predictions yield a difference signal that drives new efference copy learning.


    METHODS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

The main assumptions that were necessary to construct our computational algorithm are summarized in Table 3 and are discussed below. Only the subsections explaining our method of neural encoding (see Neural encoding, Fig. 6) and the nature of HVc_AFP activity (see Tonic activity patterns, Fig. 7) are necessary for understanding the main computational results presented in the RESULTS. Other subsections describe issues of mainly theoretical interest. In the final subsection of the METHODS, we provide formulas for our method for characterizing the developmental time course in the model. Details of the computational algorithm are presented in the APPENDIX. The assumptions outlined in Table 3 are not crucial for the main predictions of our model; alternative algorithms that implement our functional hypotheses for song learning are possible. Our particular algorithm should be seen as a first approximation, one that allows us to explore associational learning between patterns of sensory and motor activity on the time scale of tens to hundreds of milliseconds.


                              
View this table:
[in this window]
[in a new window]
 
Table 3. Theoretical assumptions



View larger version (55K):
[in this window]
[in a new window]
 
Fig. 6. Encoding the problem of sensorimotor learning. A: representation of the tutor song. Ten consecutive syllables in the tutor song ( ... ABCDE ... ). For simplicity, we assume that each tutor syllable contains a nonoverlapping set of vocal features. These are numbered according to tutor syllable (features in syllable A numbered 1-8, features in B numbered 9-16, etc.). B: neural encoding and template storage. HVc_AFP and RA contain 40 assemblies, one for each of the 40 vocal features in the tutor song. The auditory feedback pathway connects each RA assembly (motor representation) with its corresponding HVc_AFP assembly (sensory representation). The AFP contains 5 assemblies, 1 for each tutor syllable. The connections from HVc_AFP to the AFP determine how vocal features are matched to tutor syllables, i.e., these connections store the template information. Connections to syllable B are shown as an example. C: motor output of the model (RA activity) for the first 10 syllables produced. Each column shows the pattern of RA activity for one particular syllable. Each row represents the activity of a particular RA assembly over the 10 syllables shown. Since connections in HVc and RA are unstructured, random patterns of premotor drive lead to RA activity that is initially unstructured. Using reinforcement signals, the model must "transfer" template information stored in sensory coordinates in the AFP to the motor pathway.



View larger version (28K):
[in this window]
[in a new window]
 
Fig. 7. Mixing of signals in HVc_AFP. Due to the 60-ms delay between direct input from HVc_RA (5 ms) and auditory feedback (5 + 45 + 15 ms), separate calculations of HVc_AFP activity were made for the early (E), middle (M), late (L), and gap (G) portions of the premotor activity corresponding to each syllable. The efference copy output of HVc_AFP compared with the template in the AFP was calculated as the average value of HVc_AFP activity over the early and middle portions of the syllable. This activity will reach RA before the onset of the next syllable.

Each simulation consisted of repeated iterations of a computer subroutine that 1) calculated activity patterns related to a single syllable output by the model, 2) applied our synaptic plasticity rule, and 3) updated the various homeostatic mechanisms in the model. The details of the algorithm and the specification of model parameters are given in the APPENDIX. In most simulations, the subroutine was iterated for 25,000 syllables, ~5,000 more than were typically needed for model output to become stereotyped. When performance was degraded by changing parameters (see APPENDIX), simulations were extended to 50,000 syllables, but output sometimes lacked stereotypy. Computer simulations were written using the MATLAB simulation environment (version 5.3; The Mathworks, Natick, MA). Typical simulations took ~2 h when run using a 400-MHz Pentium II processor.

Neural encoding

Activity in the model was represented by the output of a number of neural "units." Each of these units is meant to represent the activity within a network of connected neurons or "cell assembly" (Hebb 1949). Hereafter, we will use the term "assemblies." Given the lack of data concerning the neural code for vocal gestures in the song system, we sought the simplest encoding scheme that could support associational learning (Table 3, number 1). Each vocal gesture produced by the model is viewed as a combination of 40 abstract "vocal features," with each RA assembly representing motor-related aspects of one feature, and each HVc_AFP assembly representing sensory-related aspects of one feature. Because of this one-to-one mapping of auditory and motor features, motor activity in a given RA assembly leads to auditory feedback input to the unique corresponding assembly in HVc_AFP. The tutor song consisted of five syllables, within the normal range for zebra finch song (3-9; Price 1979). We denote these syllables by the letters A-E and assumed that each tutor syllable was encoded by a distinct set of assemblies, allowing us to number vocal features consecutively, i.e., tutor syllable A contains vocal features 1-8, tutor syllable B contains features 9-16, etc. (Fig. 6A). The tutor template is stored in the AFP, with tutor syllables encoded in the connections from HVc_AFP: each AFP assembly corresponds to a single tutor syllable and receives input from the HVc_AFP assemblies representing the auditory features comprising that syllable (Fig. 6B). Connections related to syllable B are shown as an example. Our choice of this very simple representation was guided by the following considerations: 1) due to the complexity of the network and finite computational resources, our model contains only a limited number of assemblies; 2) since learning correlated patterns with Hebbian learning rules is a largely unsolved theoretical problem, we chose an encoding scheme in which uncorrelated patterns of motor activity result in uncorrelated patterns of sensory feedback; 3) our encoding scheme ensures decorrelation in the motor right-arrow sensory mapping even for assemblies using nonlinear input-output functions.

Initially, all connections in the motor pathway are unstructured. Thus, random activity in HVc_RA leads to random motor activity in RA (Fig. 6C). The model's task is to 1) compare sensory signals with the stored template in the AFP to guide plasticity within the motor pathway, and 2) use these signals to guide plasticity in the motor pathway so that random HVc_RA activity is converted to stereotyped patterns of RA activity matched to the tutor song.

Tonic activity patterns

For simplicity, we assume that song-related activity is encoded by the neural firing rates averaged over the course of each song syllable. Thus, the activity within each of the four neural populations is modeled as a vector of firing rates, with one entry for each assembly in the population. For all populations except HVc_AFP, firing rates are assumed to be constant during the period of premotor drive for each syllable and zero during the gap between syllables. In HVc_AFP, we divided each syllable into four time epochs depending on the combination of efference copy (related to the current syllable) and auditory feedback input received during that syllable (Fig. 7). During the early part of each syllable (marked E), HVc_AFP receives efference copy input from HVc_RA that relates to the current syllable, while the sensory input is due to delayed auditory feedback from the previous syllable. The middle portion of each syllable (marked M) corresponds to the period of silence in the delayed feedback. During this period, HVc_AFP receives efference copy input only. During the late part of the syllable (marked L), the efference copy and auditory inputs correspond to the same syllable. Finally, during the "gap" period between bursts of HVc_RA activity (marked G), HVc_AFP receives only auditory input. During the epochs when efference copy and auditory feedback inputs overlap, the two sources of input were simply summed. For computational and conceptual simplicity, we chose not to propagate this subdivision of activity to the AFP. The efference copy activity that was passed on to the AFP was calculated from the average activity in HVc_AFP during the early and middle portion of the syllable. Late and gap portions were excluded for the following reasons. RA activity generated during the current syllable contributes to the late and gap portion of HVc_AFP activity. In our sequence learning model (Troyer and Doupe 2000), the AFP not only provides a reinforcement signal to RA, but also affects the pattern of RA activity. Excluding the late and gap portions of HVc_AFP activity from the efference copy prevents RA output from contributing to RA input during the same syllable via the RA right-arrow HVc_AFP right-arrow AFP right-arrow RA feedback loop. It also prevents auditory feedback from the current syllable from contributing acutely to the AFP reinforcement signal. We will view the combined early and middle activity signal as the efference copy passed on to the AFP, although it may include auditory feedback from the previous syllable.

Plasticity rule

We used a simple model of associational learning. Synaptic projections are in principle "all-to-all," i.e., associational learning takes place between all relevant combinations of pre- and postsynaptic assemblies. Assemblies become functionally disconnected when associational learning drives connection strengths to zero. While our learning rule is meant to encompass the many potential mechanisms of associational plasticity in the song system, the form of our learning rule is based on analogies with NMDA receptor-dependent long-term potentiation (LTP; Malenka and Nicoll 1993; Table 3, number 3). In the equation below we use rpre(t) and rpost(t) to denote the activity level of the pre- and postsynaptic assemblies at time t. Each presynaptic spike (at time tpre) was assumed to give rise to a postsynaptic "plasticity trace," alpha , analogous to the amount of NMDA-receptor binding. The shape of the function alpha  determines the time window for neural plasticity (see APPENDIX). This plasticity trace is multiplied by postsynaptic activity to yield a "plasticity signal," alpha (t - tpre)rpost(t), analogous to postsynaptic calcium concentration. Input from the AFP is assumed to give a reinforcement signal R that modulates the plasticity signal in all RA assemblies. (R is set to a constant value of 1 in HVc.) Plasticity signals above a threshold value psi  increase synaptic strength (LTP); signals below psi  give rise to long-term depression (LTD; Cummings et al. 1996; Hansel et al. 1997; Lisman 1989). psi  is a "sliding threshold" that depends on the average amount of activity in the postsynaptic cell (Abraham and Bear 1996; Bienenstock et al. 1982; Sejnowski 1977). Thus, the change in synaptic strength resulting from postsynaptic activity at time t and presynaptic activity at time tpre is proportional to the following quantity (see APPENDIX)
(reinforcement × plasticity trace × post − threshold) × pre

=[<IT>R</IT>&agr;(<IT>t − t</IT><SUP>pre</SUP>)<IT>r</IT><SUP>post</SUP>(<IT>t</IT>) − &psgr;]<IT>r</IT><SUP>pre</SUP>(<IT>t</IT><SUP>pre</SUP>)

Local circuit mechanisms

Activity within each neural population was based on very simple local circuitry. The output of each excitatory cell assembly was computed as a linear function of its input after subtracting a threshold value. RA included intrinsic excitatory connections that were used to store syllable representations in a manner analogous to other associative memory models or so-called attractor networks (Table 3, number 4A; Amit 1989). To minimize computation, only RA includes such connections. Each population also includes a single inhibitory assembly that is connected to all assemblies within the corresponding population (Table 3, number 4B). Inhibition is of two basic types. HVc_RA, HVc_AFP, and the AFP use "feedforward inhibition," in which inhibitory activity is equal to the average afferent input received by the population, minus a threshold. RA uses "feedback inhibition," in which inhibitory activity is driven by the average activity within the local population. Feedback inhibition allows tighter control of the activity in the local network but is computationally more expensive. Within a given population, all excitatory assemblies receive a similar level of inhibition. Since the only assemblies that get strongly activated are those that receive enough input to overcome this inhibition, inhibition mediates a form of "competition" among excitatory assemblies.

Decorrelating initial connectivity

Our simulations were designed to determine whether associational learning, guided by template matching signals from the AFP, could organize initially unstructured connections in the motor pathway to produce the stored tutor song. The dominant computational problem encountered in building the model was the positive feedback inherent in associational learning rules: correlated activity increases synaptic strength, which tends to further strengthen the correlation. Left unchecked, this learning will continually amplify initially weak associations, even spurious associations resulting from chance events. One of the most important factors contributing to spurious correlations was the limited size of our network simulations. The strength of random correlations is highly dependent on network size, roughly decreasing with the square root of the number of network units. Because of the computational expense of simulating intrinsic feedback dynamics within RA, we limited the number of RA assemblies to 40 (Fig. 6B). Independently choosing each connection within such a network will result in correlations that are an order of magnitude stronger than those expected in a more realistically sized network containing 4000 assemblies. The calculation of HVc_RA activity was computationally less expensive, and a larger number (5 × 40 = 200) of HVc_RA assemblies was included. While reducing correlations to some degree, these numbers still do not approach physiologically realistic numbers. We note here that the greater storage capacity of larger networks resulting from a reduction in random correlations (Amit 1989) may relate to reports of a relationship between the size of various song nuclei and the number of song syllables learned (reviewed in Brenowitz 1997; Nordeen and Nordeen 1997).

To address the problem of correlated connections, we chose initial patterns of connectivity specifically aimed at minimizing these correlations (Table 3, number 5). Initial connection strengths were chosen according to two basic strategies. For HVc_RA right-arrow RA and HVc_RA right-arrow HVc_AFP connections, we used a "single-projection" strategy, in which each presynaptic assembly connects with a single postsynaptic assembly. This ensures that the levels of input received by any two assemblies in the postsynaptic population are independent. However, the single-projection strategy does not prevent correlations arising from polysynaptic pathways within the recurrent circuitry in RA. For these intrinsic RA connections, we used a "uniform" strategy, in which each presynaptic assembly connects with all postsynaptic assemblies with equal strength. This ensures that all correlations result from a global signal shared by all assemblies. While such a signal will increase overall synaptic strengths, it will not lead to spurious patterns of correlations within the network. To ensure that our model was robust to some degree of correlation, zero-mean Gaussian perturbations were added to all plastic connections during the initialization process. The standard deviation of the perturbations was set to 10% of the strength of the nonzero synapses. After the perturbation, negative strengths were set to zero. Noise was not added to the three projections that did not undergo plasticity (the premotor drive, auditory feedback, and template storage connections from HVc_AFP to the AFP).

Homeostatic mechanisms

In addition to decorrelating initial connectivity patterns, we include two sources of homeostatic negative feedback to counteract the positive feedback inherent in associational learning (Table 3, number 6). The first is a normalization of synaptic strength: after applying associational change for each simulated syllable, the strengths of all synapses onto (or from) a given assembly are multiplied by a single number so that the total amount of postsynaptic (or presynaptic) strength for any one assembly remains nearly constant (see APPENDIX). This kind of multiplicative normalization controls total synaptic strength without altering the relative magnitude of the individual connections. Presynaptic normalization was applied before postsynaptic normalization (see APPENDIX). The strengths to which synaptic connections were normalized were chosen by hand so that 1) intrinsic RA circuitry contributed a large component (50%) of the input to RA assemblies, and 2) auditory feedback contributed a modest portion (20%) of the input to HVc_AFP. The mechanisms underlying homeostasis are just now beginning to receive focused attention. Multiplicative normalization of synaptic strength has been shown by Turrigiano et al. (1998) and was hypothesized to depend on mean levels of activity. An approximation to our postsynaptic normalization rule follows if mean levels of activity (calculated on long time scales) are related to total excitatory strength synapsing on that neuron. Mechanisms such as conservation of transmitter released and/or retrograde trophic factors could underlie presynaptic normalization.

The second source of negative feedback is inhibitory plasticity that is homeostatic, i.e., if an excitatory assembly becomes too active, the inhibitory connection onto that assembly is strengthened (Rutherford et al. 1997; see APPENDIX). We note that controlling feedback in the model was not always straightforward, since oscillatory instability results if negative and positive feedback mechanisms operate on similar time scales.

Quantifying learning time course

To quantify the learning time course, we divided the model output into 250 syllable epochs and computed the matrix Mact of co-fluctuations in activity between each pair of RA assemblies over each epoch. During the mth epoch
<IT>M</IT><SUP><IT>act</IT></SUP><SUB><IT>ij</IT></SUB><IT>=</IT><FR><NU><IT>1</IT></NU><DE><IT>250</IT></DE></FR> <LIM><OP>∑</OP><LL><IT>n</IT><IT>=1+250</IT>(<IT>m</IT><IT>−1</IT>)</LL><UL><IT>250</IT><IT>m</IT></UL></LIM> [<IT>r<SUB>i</SUB></IT>(<IT>n</IT>)<IT>−</IT><IT><A><AC>r</AC><AC>&cjs1171;</AC></A></IT>(<IT>n</IT>)][<IT>r<SUB>j</SUB></IT>(<IT>n</IT>)<IT>−</IT><IT><A><AC>r</AC><AC>&cjs1171;</AC></A></IT>(<IT>n</IT>)]
where ri(n) is the activity level in the ith RA assembly, and <A><AC>r</AC><AC>&cjs1171;</AC></A>(n) is the average activity across assemblies during syllable n. We compared Mact to an ideal syllable matrix, Msyl, characterizing the groupings of assemblies that characterize syllables in the tutor song. Following Fig. 6, the 40 vocal features were grouped into five syllables, indexed as follows: syllable A, 1-8; B, 9-16; C, 17-24; D, 25-32; E, 33-40. Mijsyl = 4, if i and j belong to the same syllable; Mijsyl = -1, if i and j belong to different syllables. This is the matrix of co-fluctuations that would be obtained from tutor song depicted in Fig. 6A, if each assembly had an average activity level of 1. Comparison between matrices was done by taking the correlation coefficient (CC) between the entries in two matrices. The CC between any two N × M dimensional matrices A and B was defined as follows. First the mean value is subtracted from each element in the matrix: Âij = Aij - (1/NM) Sigma  Aij; &Bcirc;ij = Bij - (1/NM) Sigma  Bij. Then
CC(<IT>A</IT><IT>, </IT><IT>B</IT>)<IT>=</IT><FR><NU><IT>&Sgr;</IT><SUB><IT>ij</IT></SUB><IT> <A><AC>A</AC><AC>ˆ</AC></A><SUB>ij</SUB><A><AC>B</AC><AC>ˆ</AC></A><SUB>ij</SUB></IT></NU><DE>[(<IT>&Sgr;</IT><SUB><IT>ij</IT></SUB><IT> <A><AC>A</AC><AC>ˆ</AC></A></IT><SUP><IT>2</IT></SUP><SUB><IT>ij</IT></SUB>)(<IT>&Sgr;</IT><SUB><IT>ij</IT></SUB><IT> <A><AC>B</AC><AC>ˆ</AC></A></IT><SUP><IT>2</IT></SUP><SUB><IT>ij</IT></SUB>)]<SUP><IT>1/2</IT></SUP></DE></FR>
Diagonal entries were excluded, i.e., all summations were taken over indices where not equal  j.

The CC was also used to monitor the connectivity appropriate for the efference copy mapping and for the intrinsic RA connectivity that underlies syllable encoding. For the efference copy, we measured the CC between the matrix of motor right-arrow sensory connection strengths from HVc_RA right-arrow HVc_AFP and the HVc_RA right-arrow RA connections in the motor pathway. To quantify the development of syllable-based connectivity, we calculated the CC between Msyl and the matrix of intrinsic RA connection strengths, again excluding diagonal entries.


    RESULTS
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

In presenting the results of our computational model, we focus on the quantitative data produced by a single representative simulation. This allows a step-by-step illustration of song development and demonstrates the mutual consistency of the functional hypotheses described above. After presenting these results, we show how the model reacts to changes in important parameters.

Problem 1: Auditory feedback delay

The first step in our proposed solution to the problem of feedback delay is the learning of an efference copy mapping. At the beginning of each simulation, connections in the motor pathway are unstructured and random HVc_RA activity leads to random patterns of RA output (Fig. 6C). Efference copy learning results from associations between the random patterns of HVc_RA activity and HVc_AFP activity induced by auditory reafference (Fig. 4A). We examined the development of an efference copy map in two ways (Fig. 8). First, an accurate map should cause efference copy activity to match the auditory reafference. Figure 8A shows the pattern of vocal output (left column of each pair, marked V) and HVc_AFP efference copy activity (right column of each pair, marked EC) for five syllables spanning the period of initial efference copy learning. Note that, because of our simple encoding scheme, vocal output, RA motor activity, and auditory feedback have equivalent representations (see METHODS). Initially, both patterns of activity are highly distributed and unrelated (Fig. 8A, left pairs). As efference copy learning progresses, the activity remains distributed (significant syllable learning has not taken place), but the efference copy activity is highly correlated with the vocal output (Fig. 8A, right pair). Note that a perfect match is not required (Jordan and Rumelhart 1992); the efference copy estimate only has to be accurate enough so that, on average, the AFP will reinforce the proper correlations in RA (see Fig. 9, bottom). An accurate efference copy mapping can also be measured by determining the similarity between the mapping of HVc_RA onto motor features in RA and the efference copy mapping of HVc_RA onto sensory features in HVc_AFP. Figure 8B shows the correlation coefficient (see METHODS) between the connection strengths from HVc_RA right-arrow RA and those from HVc_RA right-arrow HVc_AFP. By syllable 500, efference copy correlation has reached 0.81, 84% of the maximum value (0.96) reached during the simulation.



View larger version (29K):
[in this window]
[in a new window]
 
Fig. 8. Efference copy learning. A: vocal output (V, equivalent to RA activity) and efference copy activity (EC) for syllables 1, 251, 501, 1001, and 2001. Efference copy activity is determined as the average of HVc_AFP activity over the early and middle portions of each syllable (Fig. 7). Initially, vocal output and efference copy activity are uncorrelated. By syllable 2000, activity is still not organized according to tutor syllable (syllable learning has not taken place), but efference copy activity and vocal output are similar. B: development of efference copy connectivity. Correlation coefficient between the matrix of HVc_RA projections onto motor features in RA and onto sensory features in HVc_AFP (see METHODS, Quantifying learning time course, for definition).



View larger version (37K):
[in this window]
[in a new window]
 
Fig. 9. Reinforcement signal. A: vocal output (V) and efference copy (EC) for syllables 11,000-11,010. B: reinforcement signal calculated from efference copy shown in A. EC activity concentrated within assemblies encoding a single tutor syllable led to large reinforcement signals (syllable 11,006, D; syllable 11,009, B). Activity shared by two tutor syllables led to smaller reinforcement (11,000, D/C; 11,003, B/A). During syllable 11,007, the model produced a reasonably good rendition of tutor syllable D, but minimal reinforcement was given because the efference copy prediction was inaccurate.

Problem 2: Syllable learning in RA

CALCULATION OF THE REINFORCEMENT SIGNAL. The AFP guides syllable learning by transmitting a nonspecific reinforcement signal that uniformly modulates plasticity in all RA assemblies. To calculate the match to the template, each AFP assembly sums the input from HVc_AFP assemblies encoding a distinct tutor syllable (Fig. 6B). The competition mediated by mutual inhibition in the AFP ensures that significant activation of the AFP occurs only if HVc_AFP activity is mostly confined to assemblies corresponding to one (or a few) tutor syllables. The final reinforcement value was obtained by thresholding each AFP assembly's output and summing these thresholded outputs (see APPENDIX for details).

The outcome of this procedure is shown in Fig. 9. Figure 9A shows the vocal output (marked V) and efference copy (marked EC) for 11 consecutive syllables sung during the period of syllable learning. The black bars show the reinforcement signal. This reinforcement is obtained from evaluating the HVc_AFP efference copy activity on the right of each column but is used to modulate associational learning for the RA motor activity generating the vocal output shown on the left. Large reinforcement is obtained when efference copy activity is concentrated within assemblies encoding a single tutor syllable (e.g., syllable 11,006 and 11,009). Smaller reinforcement signals are computed when HVc_AFP activity is distributed among assemblies encoding two syllables (e.g., syllables 11,000, and 11,003). Note that the 11,007th syllable produced by the model was dominated by the motor assemblies encoding D, but the AFP signaled minimal reinforcement because of an inaccurate efference copy representation.

SYNAPTIC REORGANIZATION. Reinforcement-guided syllable learning is shown in Fig. 10. Initially, RA right-arrow RA connection strengths were set to be nearly equal (A, middle), minimizing the presence of randomly correlated connections that would have to be "unlearned" (see METHODS). Note that self-connections are not included in our model (diagonal entries are zero), since strong self-correlations would tend to dominate associational learning. Unstructured input from HVc_RA (A, left) resulted in random patterns of RA activity (A, right). Because AFP-mediated reinforcement 1) is greatest when assemblies corresponding to a common tutor syllable are co-active, and 2) results in large increases in synaptic strength onto active RA assemblies, RA assemblies began to develop strong connections with other RA assemblies encoding the same syllable (B, middle). Reinforcement also guided learning within the projection from HVc right-arrow RA, causing RA assemblies encoding the same tutor syllable to receive input from similar sets of HVc_RA assemblies and thus to receive correlated patterns of HVc input (B, left). Both the recurrent circuitry and HVc_RA input led to RA activity partially matched to the tutor syllables (B, right). After learning was complete, HVc_RA input was a mixture of tutor syllable representations (C, left). Strong intrinsic circuitry (C, middle) amplifies the activity within assemblies encoding the most strongly driven syllable, and inhibitory competition suppresses other responses (see METHODS and APPENDIX). As a result, the model produced motor output perfectly matched to the syllables in the tutor song (C, right). Because HVc_RA continues to be driven by the random premotor drive, syllables are produced in a random sequence. Sequence learning will be addressed in our companion paper (Troyer and Doupe 2000).



View larger version (76K):
[in this window]
[in a new window]
 
Fig. 10. Syllable learning. Left column: strength of synaptic input coming from HVc_RA for 10 consecutive syllables. Middle column: intrinsic connections within RA. The darkness of each square represents the connection strength from one presynaptic RA assembly (horizontal axis) to one postsynaptic RA assembly (vertical axis). Self-connections (diagonal entries) are set to zero to prevent domination of self-correlations. Right column: RA activity level for same syllables shown on left. A: at the start of the simulation, initial RA connectivity was nearly uniform, and HVc input and RA output were random. B: as development proceeded, assemblies encoding a single tutor syllable began to have similar patterns of connectivity. Because assemblies encoding the same tutor syllable are arranged next to each other, the pattern of RA right-arrow RA connections began to show "blocks" of strong connections along the diagonal (middle). These assemblies also began to receive similar patterns of input from HVc (left). C: after learning, HVc input was a random mixture of syllable representations, and RA assemblies were connected only with other RA assemblies encoding the same tutor syllable. This pattern of intrinsic RA connectivity, combined with global inhibition (see METHODS), resulted in the production of patterns of RA activity matched to the tutor template (right). Learning to produce these syllables in the proper sequence is addressed in the following companion paper (Troyer and Doupe 2000).

TIME COURSE OF LEARNING. The developmental time course of song learning in our model is shown in Fig. 11. To quantify convergence toward the tutor song, we first computed an "ideal" syllable covariance matrix, Msyl. This is a 40 × 40 matrix containing the covariance in the level of activity between each pairing of the 40 RA assemblies (sampled over a number of consecutive syllables), where it is assumed that the model is producing a perfect rendition of the tutor song. Msyl has strong positive entries for pairs of assemblies belonging to the same tutor syllable and negative entries for pairs belonging to different syllables. We then divided the model output into 250 syllable epochs and computed the matrix of co-fluctuations in activity between each pair of RA assemblies over each epoch. Convergence toward the tutor song was quantified by computing the correlation coefficient between the entries in Msyl and those in the co-fluctuation matrix (Fig. 11, solid line). For detailed definitions of these calculations, see METHODS, Quantifying learning time course. We also computed the correlation coefficient between the pattern of RA connectivity and Msyl (Fig. 11, dashed line). The development of intrinsic RA connectivity is mirrored by the appearance of the corresponding correlations in RA activity.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 11. Summary of learning time course. Solid line: correlation coefficient between entries of covariance matrix calculated from 250 syllable epochs of model output and the "ideal" syllable covariance matrix, Msyl, corresponding to the tutor song (see METHODS, Quantifying learning time course, for definition). Dashed line: correlation coefficient between pattern of RA connectivity and Msyl. Dotted line: time course of efference copy learning (Fig. 8B). Syllable learning begins soon after the development of an accurate efference copy at around syllable 1500 and is largely completed by syllable 10,000.

Syllable learning is complete by the time the model has produced 20,000 syllables. Since each syllable is assumed to be 115-ms long, this represents 2,300 s or <40 min of continuous singing. Although quantitative data are not available, this is likely to be up to several orders of magnitude less than the quantity of song produced by young zebra finches during the period of sensorimotor learning. Of course, the model is solving a highly simplified task.

Problem 3: Separating motor and sensory signals in HVc

HVc_AFP receives two functionally distinct sets of inputs: efference copy inputs from HVc_RA and auditory feedback (Fig. 7). The unmixing of signals is addressed in our model by 1) using weak feedback, and 2) including "adaptation" in HVc_AFP (Fig. 5). The action of the HVc_AFP adaptation mechanism is shown in Fig. 12. Excitation within HVc_AFP assemblies recruits a negative current that decays exponentially (Fig. 12A, bottom). When the efference copy input from HVc_RA correctly predicts the pattern of auditory feedback, adaptation (open arrow) counteracts the delayed auditory feedback (black arrow) and prevents inappropriate activity during the early portion of the next syllable (x). Adaptation also prevents auditory inputs from driving HVc_AFP activity during the gap portion between syllables (McCasland and Konishi 1981). If the efference copy does not predict the level of auditory feedback, the feedback will not be canceled and this will result in an inaccurate efference copy during the subsequent syllable (not shown). However, associational learning triggered by such an "error response" in HVc_AFP will strengthen the connections from the HVc_RA assemblies that drove the strong auditory feedback, thereby improving the accuracy of future efference copy predictions. We quantified the efficacy of the cancellation mechanism by calculating the correlation coefficient between the pattern of HVc_AFP activity during the early portion of each syllable and the auditory feedback from the previous syllable. Initially, this quantity is small and positive, since auditory feedback is weak relative to the input from HVc_RA (Fig. 12B, solid line). As the efference copy map is learned (dashed line), the adaptation cancellation mechanism reduces the correlation with the auditory feedback signal to values approximately equal to zero.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 12. Cancellation of auditory feedback by adaptation in HVc_AFP. A: Example of accurate cancellation (assembly #11, syllables 39-40). E, M, L, G mark the early, middle, late, and gap portions of each syllable (see Fig. 7). Premotor activity in HVc_RA during syllable 39 results in two separate inputs to HVc_AFP: short-latency efference copy input that anticipates the delayed auditory feedback (top plots). Strong efference copy input leads to activity in HVc_AFP. Activity is seen throughout the syllable and is particularly strong during the late portion of the syllable when the efference copy and auditory feedback overlap. This activity then recruits adaptation that decays exponentially (bottom). During the early portion of syllable 40, the adaptation (open arrow) is still sufficiently strong to cancel the auditory feedback from syllable 39 (black arrow) and to prevent HVc_AFP activity (x). During syllable 40, HVc_AFP assembly #11 receives only a background level of efference copy input which accurately predicts the lack of auditory feedback to assembly # 11 for this syllable. Dotted line: mean level of efference copy input for all assemblies over the first 250 syllables produced by the model. B: correlation coefficient between the pattern of HVc_AFP activity during the early portion of each syllable and the pattern of auditory feedback from the previous syllable (solid line; average taken in 250 syllable bins), and the quality of the efference copy mapping from HVc_RA right-arrow HVc_AFP (dashed line; see Fig. 8B). As the efference copy develops, cancellation by adaptation results in correlation coefficients between HVc_AFP activity and auditory feedback from the previous syllable that are near zero.

Range of model behavior

Our results demonstrate the plausibility of our hypothesis that an efference copy mapping is used to guide the sensorimotor learning of birdsong. A detailed assessment of how our model reacts to changes in its multiple parameters is beyond the scope of this paper. Here, we briefly demonstrate the model's robustness using default parameter values and describe the most common types of breakdown in model behavior. Results are shown for alterations in four parameters: 1) the learning rate; 2) the level of correlation within the initial pattern of synaptic connectivity; 3) LTP/LTD threshold in RA; and 4) the strength of connections onto RA assemblies.

To test the robustness of the model, we ran the algorithm 10 times with parameters fixed at their default values. Simulations differed in the random seeds used to determine the initial synaptic connection strengths and the sequence of random premotor drives. The time course of learning in these simulations is shown in Fig. 13. Syllable learning followed a similar time course in all simulations, eventually resulting in accurate reproductions of the template syllables.



View larger version (24K):
[in this window]
[in a new window]
 
Fig. 13. Variability of learning time course. Development of syllable-related activity for 10 repeated simulations using default parameters. Simulations used different seeds to determine random components of initial connectivity and the sequence of premotor drives. Output quantified as in Fig. 11.

The most difficult problem encountered in constructing a working algorithm was the instability of learning due to the positive feedback inherent in associational plasticity, i.e., correlated activity leads to stronger connections, which in turn lead to more strongly correlated activity. This is a general and largely unresolved theoretical problem; in our model we addressed it using a variety of negative feedback mechanisms (see METHODS). When syllable learning did break down, two general types of errors were most common. First, spurious correlations caused errors during learning. For example, faster learning rates led to more rapid synaptic change, making spurious correlations more prominent. These spurious correlations were then amplified by associational learning. Alternatively, increased correlations within the initial pattern of synaptic connectivity could also be amplified, leading to degraded learning (see METHODS and APPENDIX). The parameter dependence of these effects is shown in Fig. 14A. Increasing the initial level of correlation resulted in gradual loss of learning. Increasing learning rates led to highly variable results, with one simulation showing perfect learning, even when learning rates were increased by a factor of 10. Figure 14B shows spurious correlations in a different simulation in which learning rates were increased by a factor of 10 (Fig. 14A, arrow). While many assemblies are co-active with others in the same tutor syllable representation, other assemblies are "incorrectly" co-active with assemblies from different syllables (Fig. 14B, left panel). These incorrect associations can be seen in the pattern of intrinsic RA connections, where strong connections are concentrated in blocks along the diagonal, but scattered inappropriate connections are seen as well (Fig. 14B, right panel; cf. Fig. 10).



View larger version (42K):
[in this window]
[in a new window]
 
Fig. 14. Errors in syllable learning. Top: amplification of random correlations. A: disruption of syllable-related activity from increased learning rates (circles) and increased correlation in the initial pattern of connectivity (marked +). Degree of syllable learning after 25,000 simulated syllables (quantified using correlation coefficients as in Fig. 11). B: RA activity (left) and RA right-arrow RA connectivity (right, compare Fig. 10) when learning rates were increased by a factor of 10 (arrow in A). Incorrect associations are made, with some RA assemblies co-active with and connected to assemblies encoding a different tutor syllable. Three "new syllables" have been created; one includes a single assembly from syllables B, C, and D (black arrows), another a single assembly from C and E (open arrows), and a third includes 2 assemblies from syllable A and 1 from syllable D (gray arrow). Inappropriate connections appear as dark entries outside of diagonal blocks (right). Bottom: increased competition in RA. C: disruption of syllable-related activity from increased long-term potentiation/long-term depression (LTP/LTP) threshold in RA (circles) and increased synaptic strengths in RA (marked +). Degree of syllable learning after 50,000 simulated syllables. D: LTP/LTD threshold increased by a factor of 2 (arrow in C). Increased competition causes RA assemblies encoding a single tutor syllable to divide into "subclusters" of assemblies that are co-active and are strongly connected. For example, syllable A has been cleanly split into 2 subclusters, whereas within syllable E, connectivity is still rather diffuse.

A second type of common error was caused by a misadjustment of competitive mechanisms in the network. The most important parameter contributing to competition between different representations was the LTP/LTD threshold (see METHODS). Competition could also be increased by increasing inhibition. We increased inhibition indirectly, by first scaling the total excitatory synaptic strength in RA which then triggered a homeostatic increase in inhibition. The parameter dependence of these effects is shown in Fig. 14C. Figure 14D shows the effects of increased competition due to increasing the LTP/LTD threshold in RA by a factor of 2 in RA (Fig. 14C, arrow). Because of the increased competition, syllable representations have been split into subsyllables, in which RA assemblies are strongly connected to and co-active with a subset of the assemblies encoding the same tutor syllable.


    DISCUSSION
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

Principal findings and predictions

By constructing a computational model, we have provided a theoretical demonstration that associational plasticity, guided by template comparison signals transmitted by the AFP, can account for the sensorimotor learning of birdsong syllables. The model incorporates a wide range of experimental data related to song learning and addresses the crucial problem of feedback delay during motor learning: the delay for reafferent auditory signals returning to RA via the AFP is estimated to be nearly as long as a typical song syllable. Our model suggests that the bird solves this problem by generating an internal prediction, or efference copy, of the expected auditory reafference within the song nucleus HVc. This efference copy is then compared with the stored template to guide song learning. Thus, we predict that activity recorded in the AFP during singing (Hessler and Doupe 1999a,b) is a motor signal encoding sensory information.

Experiments designed to test this prediction face two significant challenges. First, the nature of both sensory and motor representations within the song system is poorly understood. Direct tests will require substantial progress in this area. Second, separating the sensory and motor aspects of neural activity during sensorimotor behaviors is notoriously difficult. For example, the auditory processing of a bird's own feedback during singing, and the processing of the same auditory signal when not singing, may be quite different. One approach to this problem is to alter auditory feedback pathways and monitor neural activity and/or motor output during singing. Our model predicts that changes in song due to altering the auditory feedback pathway should be indirect effects of perturbing the efference copy mapping, which then causes errors in the motor output. Since we assume that auditory feedback does not play an active role during vocal production, we expect that removal of auditory feedback by deafening should result in no immediate change in vocal output, consistent with data from many avian species (Konishi 1965; Nottebohm 1968; Price 1979). Use of an efference copy is also consistent with the slow degradation of song after complete removal of auditory feedback by deafening (Nordeen and Nordeen 1992) and the more rapid degradation seen after perturbation of auditory feedback by consistent playback of auditory signals any time the bird sings (Leonardo and Konishi 1999). Altered feedback is expected to result in an active and hence more rapid alteration of the efference copy mapping, whereas removal of feedback could allow a passive drift in the efference copy map, resulting in a slow degradation of song. Note that these data are not naturally accounted for by "error-based" learning hypotheses, since deafening results in a large change in the sensory signal. By retaining the key elements of the AFP comparison hypothesis, our model is consistent with the finding that AFP lesions prevent the disruption of song due to deafening (Brainard and Doupe 2000).

In addition to being consistent with the behavioral data, our model makes the specific prediction that a mismatch between actual and expected auditory feedback should elicit a detectable change in the song-related activity of HVc_AFP neurons. This change may also be indirectly registered in AFP neurons (as well as HVc_RA neurons, see Troyer and Doupe 2000). If the mismatch is sustained, increasingly significant changes in song-related neural activity should be seen in HVc and the AFP over time. Since a change in AFP output is required to alter the connectivity in RA in the model, these changes should be recorded before significant changes are able to be recorded in the patterns of RA motor activity or song output.

At the level of song circuitry, the model predicts that the motor-to-sensory transformation necessary for an efference copy is learned within the connections between the two populations of HVc projection neurons. Consistent with earlier suggestions based on physiological evidence, the model makes the further anatomical hypothesis that auditory afferents to HVc should preferentially (although not necessarily exclusively) synapse onto AFP-projecting neurons while premotor afferents should preferentially synapse onto RA-projecting HVc neurons (Katz and Gurney 1981; Kimpo and Doupe 1997; Lewicki 1996; Saito and Maekawa 1993).

In addition to the problem of feedback delay, our model solves a second problem posed by the AFP comparison hypothesis: if the AFP were to directly evaluate auditory feedback signals, these signals would have to bypass strong, ongoing premotor activity within HVc (Fig. 1, C and D; McCasland 1987; McCasland and Konishi 1981; Yu and Margoliash 1996; but see Foster and Bottjer 1998). By proposing that the AFP evaluates an efference copy, our model circumvents this problem and suggests a functional reason for why the AFP lies downstream of HVc: use of an efference copy requires that template comparison take place downstream of the motor pattern generator. Because auditory inputs are necessary for efference copy learning, however, our solution does not eliminate the problems raised by the mixing of motor and sensory signals within HVc. We predict that these signals are kept separate in HVc both by the greater strength of the efference copy signal, and by the cancellation of auditory reafference (McCasland and Konishi 1981) by strong adaptation mechanisms (synaptic or neuronal) within HVc_AFP. A slow after-spike hyperpolarization recently found in AFP-projecting HVc neurons (Dutar et al. 1998) may contribute to this cancellation. Because the cancellation depends on an accurate efference copy, our model predicts that auditory signals recorded in HVc or the AFP should be stronger in very young birds than in juveniles or adults.

Finally, we propose that circuits intrinsic to RA play an important role in encoding the motor programs for individual song syllables (cf. Spiro et al. 1999). This proposal is consistent with anatomical data (Herrmann and Arnold 1991) as well as the hypothesis that the precision of RA activity (Yu and Margoliash 1996) emerges as a result of neural circuitry intrinsic to RA, rather than being driven by (temporally less precise) inputs from HVc.

Alternative models and mechanisms

Our model constitutes a sufficiency argument, i.e., the model demonstrates that the proposed hypotheses are sufficient to solve important problems related to song learning. However, experimental data related to song learning are simply too sparse to disallow a wide range of possibilities. In particular, there are several alternatives to our proposal that efference copy is used to mitigate the problem of feedback delay. For example, it is possible that some song learning is done "off-line," i.e., aspects of motor activity and sensory reafference may be stored in medium- or long-term memory and used to readjust the motor circuit when the bird is not singing, perhaps during sleep (Dave et al. 1999). A more likely alternative is the use of short-term or "working" memory mechanisms. At the synaptic level, "memory traces" (Houk et al. 1995) could "tag" (Frey and Morris 1998) a synaptic site, making it receptive to delayed signals related to template comparison. At the network level, activity related to the motor command could be maintained within a feedback circuit and then compared with the auditory feedback when it arrives. These proposals raise a number of questions that have yet to be investigated. Most notable is the difficulty of directing memory signals toward the appropriate connections in a manner that is not disturbed by the presence of strong ongoing motor activity. The problem of segregating motor and sensory signals is also shared by song learning models in which HVc both generates premotor commands and passes auditory feedback information on to the AFP. For example, Doya and Sejnowski (1998) have proposed a model that is similar to ours in adopting the AFP comparison hypothesis and using reinforcement learning to guide song learning in RA (see also Fry 1996). However, Doya and Sejnowski (1998) do not address the problems of feedback delay and sensory/motor mixing in HVc that lie at the core of our model.

Weaknesses of the model

Our model has a number of weaknesses. The most important of these is our strong simplifying assumptions regarding the encoding of sensory and motor information related to song. These simplifications were chosen for two main reasons. First, extremely little is known about the manner in which song is encoded in the patterns of neural activity distributed across the various song nuclei. Second, our theoretical understanding of Hebbian learning rules is limited. In particular, the tendency for these rules to amplify "spurious" correlations has only been addressed in networks of limited complexity.

Another weakness of the model is that the auditory feedback problem has only been partially addressed. Using our estimates, AFP comparison signals will arrive in RA with a delay of roughly 40 ms. While this does prevent significant overlap with the motor activity related to the next syllable, 40 ms may still represent a significant delay given that RA motor activity is time-locked to the motor output with a precision of less than 5 ms (Yu and Margoliash 1996). One possibility for addressing this problem would be to use internal regularities in the motor program and temporally asymmetric learning rules to anticipate the future state of the motor program. Efference copies of this predicted motor command could be processed in the AFP and arrive in RA time locked to the arrival of the actual motor command from HVc. Overall, the model points to the need for better data concerning the temporal relationships between activity patterns in the various song nuclei during singing. Our estimate of AFP processing time is based on variable auditory latencies recorded in anesthetized birds, and hence, is only poorly constrained. Better timing estimates obtained by microstimulation and/or correlation analyses in singing birds could yield important information regarding the functional interactions between various nuclei during song production.

Related to the issue of processing delays is the fact that temporal aspects of song change with development. Generally, syllables are longer and produced at a slower tempo in young birds (Immelmann 1969). Thus, one possible solution to the problem of feedback delay is that it may simply be a smaller problem in juveniles. However, this solution depends on the motor commands for the slow juvenile syllables being nearly identical to those for the more rapid syllables sung in adulthood. Furthermore, even though syllables are longer, neural processing may also be slower in young birds, thereby increasing feedback delay.

Location of the template

Our model is built on the working assumption that the memorized template is stored in the AFP. While the data pointing to the AFP as a candidate site are suggestive (Basham et al. 1996; Bottjer et al. 1984; Scharff and Nottebohm 1991; Sohrabji et al. 1990), direct physiological tests have been equivocal (Doupe and Solis 1997; Solis and Doupe 1997, 1999). Consequently, we have begun to generalize our model to consider the possibility that template information is stored in auditory areas closer to the periphery (for candidate sites, see Bolhuis et al. 2000; Foster and Bottjer 1998; Mello et al. 1998; Vates et al. 1996). Initial simulations suggest that song learning can be guided by auditory feedback that reaches the song system only after it has been filtered through neurons selective for the tutor song. This "template filter hypothesis" raises the possibility that sensorimotor learning may involve the transfer of template information into the song system. The fact that this transfer would rely on the bird's own vocalization serving as a carrier signal is consistent with experimental data showing that AFP neurons develop selective auditory responses to both the bird's own song and the tutor song (Doupe 1997; Solis and Doupe 1997, 1999) during the sensorimotor phase of song learning. Note that efference copy may still play an important role, since a different template storage site does little to alter the basic problem of feedback delay. Further investigations are required to fully explore these possibilities.

Role of efference copy

We have used the term efference copy to refer to a motor signal that has been converted to sensory coordinates. Our use of efference copy is most similar to the notion of a forward model used for motor learning and control (Jordan 1995; Miall and Wolpert 1996; Miall et al. 1993): an internal prediction that is compared with a target reference, in our case, the tutor song. Our model differs from standard motor control models in that the efference copy is primarily used to modulate plasticity rather than to control ongoing vocalization. Moreover, template comparison in our model does not result in an "error" or "mismatch" signal, i.e., the difference between the tutor template and the bird's own song is never computed. Instead, the model relies on "matching" signals that could be easily computed by neurons receiving input from a population of cells broadly tuned to the tutor song.

Our model also uses efference copy in its classic role as a negative image used to "subtract off" sensory reafference (Bell et al. 1997; Sperry 1950; von Holst and Mittelstaedt 1980). However, the purpose of the cancellation is to prevent interference with the ongoing motor program, not to differentiate the sensory signals that are due to an animal's own behavior from those caused by events in the external world. Furthermore, this negative image is a secondary effect in our model, resulting from adaptation mechanisms within HVc_AFP (Fig. 5).

Random motor behavior and innate templates

Our model uses reinforcement learning to refine initially random activity into motor commands matched to an internal template. A major drawback of reinforcement learning is the "curse of dimensionality," i.e., if motor space contains too many degrees of freedom, the chance of randomly activating an appropriate combination of motor neurons is exceedingly low. In our model, RA premotor neurons have been preorganized into a small number (40) of motor assemblies. Thus, random activity in the model is actually confined to a relatively narrow range of possible vocal productions, allowing successful reinforcement-based learning after a relatively brief period of vocal development. In this way, our model can be seen as relying on an "innate template" (reviewed in Marler 1997) to reduce the dimensionality of motor space.

Efference copy may be common in many systems

The close parallels between vocal learning in birds and humans (Doupe and Kuhl 1999) suggest that efference copy may also play a role during speech development. For example, speech is slowly degraded in humans deafened as adults (Cowie and Douglas-Cowie 1992; Waldstein 1989), but can be altered within an hour by systematic alterations of auditory feedback (Houde and Jordan 1998). Moreover, mismatches between expected and received auditory feedback cause increased activation in auditory language areas in temporal cortex (Hirano et al. 1997; McGuire et al. 1996). These results are entirely consistent with the efference copy hypothesis: passive drift after deafening and a more active alteration of the efference copy with altered feedback, perhaps via an association of motor commands with the mismatch signal registered in temporal cortex.

At a general level, our model focuses on the interaction between reciprocally connected populations of neurons, where one population has been assigned a primarily motor and the other a primarily sensory role. This dichotomy parallels traditional views of motor/sensory circuits subserving language (Wernicke 1908) and within frontal/parietal circuits underlying memory-guided reaching and saccade behaviors (Chafee and Goldman-Rakic 1998). Efference copy learning may be a natural consequence of Hebbian learning within such a circuit. Our model suggests that this learning is expected to occur whenever 1) a projection exists from neurons displaying motor activity to neurons that receive sensory inputs, and 2) the time window for associative learning is roughly matched to the sensory feedback delay. The simplicity of these conditions argues that use of an efference copy may be a common strategy for overcoming feedback delay in a wide variety of circuits subserving sensorimotor learning.


    APPENDIX
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

This appendix contains details of the implementation of our computational algorithm. We abbreviate HVc_RA as HR, HVc_AFP as HA, and AFP as AF. rHR, rRA, rHA-E, rHA-M, rHA-L, rHA-G, and rAF denote firing rates, where E, M, L, G refer to the early, middle, late, and gap portions of HVc_AFP activity (see Fig. 7). rEC denotes the HVc_AFP efference copy activity passed to the AFP and is calculated as the average HVc_AFP activity during the early (25-ms long) and middle (35-ms long) portions of the current syllable: rEC = (25rHA-E + 35rHA-M)/(25 + 35). [HA, HR], [RA, HR], and [RA, RA] denote the three sets of excitatory synaptic connections that undergo learning, where [post, pre] denotes a matrix of synaptic strengths. For example, [HA, HR]ij is the connection strength from the jth HVc_RA assembly to the ith HVc_AFP assembly. We use |x+| = max (x, 0) to denote rectification, and < x>  = (1/N) Sigma i=1N xi to denote averaging. Values of most parameters are expressed in arbitrary units calibrated so that the homeostatically controlled average firing rate, and the input/output gain (output = gain × |net input - threshold|+), are equal to 1. For HVc_AFP, average firing rate was the weighted average of firing over all four portions of the syllable.

The calculation of neural activity was based on the following rule: ri = |input - adaptation - inhibition - theta |+, where theta  = 1 is spike threshold. Only HVc_AFP includes adaptation. Inhibition is calculated as GiI, where I represents the activity in a single local inhibitory assembly, and Gi is the inhibitory strength onto excitatory assembly i. In HVc_RA, HVc_AFP, and the AFP, inhibition is "feedforward": inhibitory activity I is set equal to the average afferent (feedforward) input received by the population, minus a threshold, i.e., I = |< aff>  - theta I|+. theta i was set to 20% of the target level of input for these populations (see step 8a below): theta IHR = theta IHA = 4, and theta IAF = 3. RA included feedback dynamics (described below, step 3). RA inhibition is also "feedback," i.e., I is set equal to the mean level of activity within RA, minus a threshold: I = |< rRA>  - theta IRA|+ with theta IRA = 0.20.

Simulations

For each syllable, n, we performed the following nine steps. These were repeated for a fixed number of syllables (usually 25,000).

1. Generate premotor drive, pi(n), onto each HVc_RA assembly, i. First we generate random variables pi = |eta |+, where eta  was generated from a Gaussian distribution with mean equal to 3 and variance equal to 1. pi(n) = pdrivepi/< p> . pdrive = 20 is a constant that determines the magnitude of the drive.

2. Calculate HVc_RA activity. The input afferent to HVc_RA is equal to the premotor drive: affiHR(n) = pi(n). Output firing rates are determined from riHR(n) = |affiHR(n- GiHR - theta |+, I = |< affHR(n)>  - theta IHR|+.

3. Calculate RA activity. The afferent input is calculated as affiRA(n) = Sigma j [RA, HR]ijrjHR(n). To calculate the output firing rates, the following dynamics were simulated
<IT><A><AC>u</AC><AC>˙</AC></A><SUB>i</SUB></IT>(<IT>t</IT>)<IT>=</IT>−<IT>u<SUB>i</SUB></IT>(<IT>t</IT>)<IT>+</IT><FENCE><IT>aff</IT><SUP><IT>RA</IT></SUP><SUB><IT>i</IT></SUB>(<IT>n</IT>)<IT>+</IT><LIM><OP>∑</OP><LL><IT>j</IT></LL></LIM> [<IT>RA</IT><IT>, </IT><IT>RA</IT>]<SUB><IT>ij</IT></SUB><IT>r</IT><SUP><IT>RA</IT></SUP><SUB><IT>j</IT></SUB>(<IT>t</IT>)</FENCE>

<FENCE><IT>−</IT><IT>G</IT><SUP><IT>RA</IT></SUP><SUB><IT>i</IT></SUB><FENCE>⟨<IT>r</IT><SUP><IT>RA</IT></SUP>(<IT>t</IT>)⟩<IT>−&thgr;</IT><SUP><IT>RA</IT></SUP><SUB><IT>I</IT></SUB></FENCE><SUP><IT>+</IT></SUP></FENCE>

<IT>r</IT><SUP><IT>RA</IT></SUP><SUB><IT>i</IT></SUB>(<IT>t</IT>)<IT>=‖</IT><IT>u<SUB>i</SUB></IT>(<IT>t</IT>)<IT>−&thgr;‖<SUP>+</SUP></IT>

<IT>u<SUB>i</SUB></IT>(<IT>0</IT>)<IT>=aff</IT><SUP><IT>RA</IT></SUP><SUB><IT>i</IT></SUB>(<IT>n</IT>)<IT>−</IT>⟨<IT>aff<SUP>RA</SUP></IT>⟩<IT>+&thgr;</IT>
where ui should be thought of as the typical membrane potential for neurons in assembly i. The dynamics were simulated on the interval [0, 2] using the MATLAB command "ode23". To monitor convergence, every 250 syllables the dynamic simulations were continued over the interval [0, 10] and the root-mean-square (RMS) difference between RA activity at the end of short and long intervals was calculated. RMS = (1/40){Sigma i [riRA(2) - riRA(10)]2}1/2. During most of the learning, simulations over the short interval resulted in near convergence of the dynamics (RMS < 0.1). Convergence was not complete during the final period of syllable learning (RMS > 0.1 for syllables 10,500-16,250). Incomplete convergence will favor afferent over recurrent contributions to the final RA activity pattern, but should not noticeably alter our results. This was born out in a few (computationally intensive) simulations in which long intervals were used during every syllable.

4. Calculate HVc_AFP activity. Separate calculations were made for the four syllable subdivisions (E, M, L, G; Fig. 7)
aff<SUP><IT>HA-E</IT></SUP><SUB><IT>i</IT></SUB>(<IT>n</IT>)<IT>=</IT><LIM><OP>∑</OP><LL><IT>j</IT></LL></LIM> [<IT>HA</IT><IT>, </IT><IT>HR</IT>]<SUB><IT>ij</IT></SUB><IT>r</IT><SUP><IT>HR</IT></SUP><SUB><IT>j</IT></SUB>(<IT>n</IT>)<IT>+</IT><IT>F<SUB>ij</SUB>r</IT><SUP><IT>RA</IT></SUP><SUB><IT>j</IT></SUB>(<IT>n</IT><IT>−1</IT>)

aff<SUP><IT>HA-M</IT></SUP><SUB><IT>i</IT></SUB>(<IT>n</IT>)<IT>=</IT><LIM><OP>∑</OP><LL><IT>j</IT></LL></LIM> [<IT>HA</IT><IT>, </IT><IT>HR</IT>]<SUB><IT>ij</IT></SUB><IT>r</IT><SUP><IT>HR</IT></SUP><SUB><IT>j</IT></SUB>(<IT>n</IT>)

aff<SUP><IT>HA-L</IT></SUP><SUB><IT>i</IT></SUB>(<IT>n</IT>)<IT>=</IT><LIM><OP>∑</OP><LL><IT>j</IT></LL></LIM> [<IT>HA</IT><IT>, </IT><IT>HR</IT>]<SUB><IT>ij</IT></SUB><IT>r</IT><SUP><IT>HR</IT></SUP><SUB><IT>j</IT></SUB>(<IT>n</IT>)<IT>+</IT><IT>F<SUB>ij</SUB>r</IT><SUP><IT>RA</IT></SUP><SUB><IT>j</IT></SUB>(<IT>n</IT>)

aff<SUP>HA-G</SUP>(<IT>n</IT>)<IT>=</IT><LIM><OP>∑</OP><LL><IT>j</IT></LL></LIM> <IT>F<SUB>ij</SUB>r</IT><SUP><IT>RA</IT></SUP><SUB><IT>j</IT></SUB>(<IT>n</IT>)
where F is the (nonplastic) matrix that determines the transformation from RA activity into auditory feedback and is equal to the identity matrix times a constant (=4) that sets the overall strength of auditory feedback.

Output firing rates are determined from riHA(n) = |affiHA(n- ai - GiHAI - theta |+, I = |< affHA(n)>  - theta IHA|+. The level of adaptation, ai, was updated after calculating activity for each of the syllable subdivisions (see step 8c).

5. Calculate AFP activity. affkAF(n) = Sigma j Tkj<RAD><RCD> <IT>r<SUB>j</SUB></IT><SUP>EC</SUP></RCD></RAD>(n), where T is the connection matrix from HVc_AFP to the AFP that encodes tutor syllables (Fig. 6B). Tkj = 1.875 if assembly j belongs to tutor syllable k; Tkj = 0 otherwise. A sublinear (square root) function is included so that a better match is obtained when efference copy activity is distributed equally among the assemblies encoding a given tutor syllable, rather than the having strong activity within just a few assemblies. Output firing rates were calculated as rkAF(n) = |affkAF(n- GkAFI - theta |+, I = |< affAF(n)>  - theta IAF|+.

6. Calculate reinforcement. Rksyl(n) = |&Rcirc;sylrkAF(n- phi k|+ is the contribution to the reinforcement from the match to the kth tutor syllable, where phi k represents a threshold that is adjusted homeostatically (see below, 7c). &Rcirc;syl = 5 is a constant that determines how large phi k must be to keep Rksyl controlled. Large &Rcirc;syl requires a large value for phi k and hence yields significant reinforcement for only the best matches. The total reinforcement signal R(n) = cR(0.15 + 0.85Rsyl(n)). cR = 20 determines the overall magnitude of the reinforcement signal. Note that 15% of the reinforcement signal is independent of the template match. This is included to be consistent with our model of sequence learning (see Troyer and Doupe 2000).

7. Update synaptic strengths. Synaptic plasticity was based on the following rule
&Dgr;[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB>(<IT>n</IT>)

<IT>=</IT><IT>k</IT><SUP><IT>post,pre</IT></SUP> <LIM><OP>∫</OP><LL><IT>t</IT><SUB><IT>start</IT></SUB></LL><UL><IT>t</IT><SUB><IT>end</IT></SUB></UL></LIM><IT> dt<SUP>pre</SUP> </IT><LIM><OP>∫</OP><LL><IT>t</IT><SUB><IT>start</IT></SUB></LL><UL><IT>t</IT><SUB><IT>max</IT></SUB></UL></LIM><IT> d</IT><IT>t</IT>(<IT>&agr;</IT>(<IT>t</IT><IT>−</IT><IT>t</IT><SUP><IT>pre</IT></SUP>)<IT>&rgr;</IT><SUP><IT>post</IT></SUP><SUB><IT>i</IT></SUB>(<IT>t</IT>)<IT>−&psgr;</IT><SUP><IT>post</IT></SUP><SUB><IT>i</IT></SUB>)<IT>r</IT><SUP><IT>pre</IT></SUP>(<IT>t</IT><SUP><IT>pre</IT></SUP>)
kpost,pre is a constant determining the rate of synaptic plasticity in that pathway (kHR,HA = 5 × 10-5 ms-2, kHR,HA = 1 × 10-12 ms-2, and kRA,RA = 2 × 10-13 ms-2). tstart and tend delimit the arrival of presynaptic activity related to syllable n, where this is calculated using syllable lengths and latencies as in Fig. 2, A and B. The gap between syllables was considered part of the preceding syllable (but see Williams and Staples 1992). rho post is the postsynaptic activity relevant for plasticity: rho HA = rHA and rho RA = RrRA. The threshold psi  was proportional to the running average of postsynaptic activity (<A><AC>&rgr;</AC><AC>&cjs1171;</AC></A>ipost, see step 9): psi ipost = bpost<A><AC>&rgr;</AC><AC>&cjs1171;</AC></A>ipost(n - 1). bHA = 0.08; bRA = 1.

To implement our plasticity rule, we divided time into intervals of constant pre- and postsynaptic activity ([tpost, tpost + tau post] and [tpre, tpre + tau pre]) that were either 1) completely overlapping (tpre = tpost, tau pre = tau post), or 2) completely nonoverlapping (tpre + tau pre < tpost). We then rewrote our rule as
&Dgr;[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB>(<IT>n</IT>)<IT>=</IT><IT>k</IT><SUP><IT>post,pre</IT></SUP>(<IT>C</IT><IT>&tgr;<SUP>pre</SUP>&tgr;<SUP>post</SUP></IT>)<IT>r</IT><SUP><IT>pre</IT></SUP>(<IT>&rgr;</IT><SUP><IT>post</IT></SUP><SUB><IT>i</IT></SUB><IT><A><AC>&agr;</AC><AC>&cjs1171;</AC></A>−&psgr;<SUP>post</SUP></IT>)
where <A><AC>&agr;</AC><AC>&cjs1171;</AC></A> is the average value of alpha (spost - spre) when spost and spre lie in the appropriate intervals with spost > spre. C is the proportion of the time that spost > spre: C = 1/2 for condition 1), and C = 1 for condition 2). In setting tmax, we assumed that the time course of RA plasticity was sufficiently rapid so that only within-syllable associations need be considered and we need only specify the average value of alpha , <A><AC>&agr;</AC><AC>&cjs1171;</AC></A> = 1. In HVc_AFP, only associations within a syllable and from one syllable to the next were considered. The time course of the neural plasticity trace, alpha (tau ), was modeled as a difference of exponentials: alpha (tau ) = (e-tau /tau fall - e-tau /tau rise)/a. tau rise = 1 ms, tau fall = 40 ms, and a is a normalizing constant that ensures that alpha (t) has a maximum value of 1.

To reduce spurious correlations, we included a "momentum" term in our update rule (e.g., Rumelhart et al. 1986). The total synaptic change at each time step, <A><AC>&Dgr;</AC><AC>&cjs1171;</AC></A>[post, pre]ij(n), was computed by taking the running average of past associations
<A><AC>&Dgr;</AC><AC>&cjs1171;</AC></A>[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB>(<IT>n</IT>)<IT>=</IT>(<IT>1−&ggr;</IT>)<IT><A><AC>&Dgr;</AC><AC>&cjs1171;</AC></A></IT>[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB>(<IT>n</IT>)<IT>+&Dgr;</IT>[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB>(<IT>n</IT>)

[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB>(<IT>n</IT>)<IT>=</IT>[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB>(<IT>n</IT><IT>−1</IT>)<IT>+<A><AC>&Dgr;</AC><AC>&cjs1171;</AC></A></IT>[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB>(<IT>n</IT>)
With momentum, the change in strength for the current syllable results from associations occurring over the previous ~1/gamma syllables. We use gamma  = 1/1000. Although it is added for computational reasons, momentum may have some relation to mechanisms of memory consolidation acting on the time scale of hours (Karni et al. 1998).

8. Update and apply homeostatic mechanisms.

8a. Normalize synaptic strengths. First, we normalized total synaptic strength for each presynaptic neuron
[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB><IT>=</IT><OVL>[<IT>post</IT><IT>, </IT><IT>pre</IT>]</OVL><FENCE>[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB><IT>N</IT><SUP><IT>post</IT></SUP><FENCE><LIM><OP>∑</OP><LL><IT>j</IT></LL></LIM> [<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB></FENCE></FENCE>
Then, we normalized for each postsynaptic neuron
[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB><IT>=</IT><OVL>[<IT>post</IT><IT>, </IT><IT>pre</IT>]</OVL><FENCE>[<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB><IT>N</IT><SUP><IT>pre</IT></SUP><FENCE><LIM><OP>∑</OP><LL><IT>i</IT></LL></LIM> [<IT>post</IT><IT>, </IT><IT>pre</IT>]<SUB><IT>ij</IT></SUB></FENCE></FENCE>
[post, pre] determines the average synaptic strength for each projection and was set to control the average amount of input received by assemblies in each population. This in turn controls the degree of inhibitory competition (see METHODS). To set [post, pre], we first determined the desired level of input, then assumed that presynaptic activity was maintained at an average value of 1 and divided by the number of presynaptic assemblies. HVc_RA assemblies received 20 units of input on average, as did HVc_AFP assemblies (when both efference copy and auditory feedback inputs were active); RA and AFP assemblies received 15 units. In HVc_AFP and RA, synaptic strengths were normalized to ensure a fixed ratio of synaptic input: auditory feedback accounted for 20% of the input to HVc_AFP relative to the input from HVc_RA; intrinsic connections provided 50% of the input to RA assemblies, with the other 50% coming from HVc_RA. [HA, HR] = 0.08 = (1 - 0.2) × 20/200; [RA, RA] = 0.1875 = 0.5 × 15/40; and [RA, HR] = 0.0375 = 0.5 × 15/200.

8b. Update inhibitory strengths. These are designed to keep average activities, <A><AC>r</AC><AC>&cjs1171;</AC></A>i (see step 9), at the target value rtarg = 1. Delta Gipost = kinh(<A><AC>r</AC><AC>&cjs1171;</AC></A>ipost - rtarg), where kHR = 1 × 10-4, kinh = 2 × 10-5. A similar algorithm was used to determine reinforcement thresholds: Delta phi k = (2.5 × 10-4)(<A><AC>R</AC><AC>&cjs1171;</AC></A>ksyl - 1). Inhibitory changes were also smoothed using momentum (see step 7), but gamma I = 1/100, making inhibitory change faster than for excitation and thus avoiding feedback oscillations.

8c. Update adaptation. Only HVc_AFP included adaptation. The level of adaptation for assembly i, ai(t), was updated at the end of each epoch (early, middle, late, and gap; Fig. 7). Adaptation increase is proportional to activity; adaptation decrease results from exponential decay. Assuming that an assembly had activity r for a time period of length tau  starting at time t, ai(t + tau ) tau hri + e-tau /tau decayai(t). tau decayHA = 115 ms and hHA = 0.043 ms-1. Assuming a constant activity level of 1, adaptation would have a strength of five input units, 25% of the total input during periods when HVc_AFP is receiving both efference copy and auditory input.

9. Calculate running averages of activity. The running average of activity was calculated using <A><AC>r</AC><AC>&cjs1171;</AC></A>(n) = (1 - beta r)<A><AC>r</AC><AC>&cjs1171;</AC></A>(n - 1) + beta rr(n). For all activity variables, beta r = 1/10. Since reinforcement matches were more variable, slower averaging (beta Rsyl = 1/100) was used for <A><AC>R</AC><AC>&cjs1171;</AC></A>ksyl.

Initializing variables

Initial excitatory synaptic strengths were set as described in METHODS. To equilibrate homeostatically adjusted variables, 500 syllables were simulated in which no associational learning took place. When reporting our results, these syllables were not included, i.e., syllable number 1 starts after this period.

Simulations with altered parameters

In arriving at our results, many simulations were run in which parameters were varied in a nonsystematic manner (results not reported). To more systematically explore the range of model behavior, simulations were run when various parameters were increased by a constant factor c. Figure 14A (circles) shows increased excitatory and inhibitory learning rates, kpost,pre right-arrow c × kpost,pre and kinh right-arrow c × kinh. Figure 14A (plus signs) shows increased correlation in initial connections, Gaussian noise added when setting initial synaptic strengths right-arrow c × 10% of strength of nonzero synapses (see METHODS). Figure 14C (circles) shows increased LTP threshold in RA, bRA right-arrow c × bRA. Figure 14A (plus signs) shows increased synaptic strengths in RA, [RA, RA] right-arrow c × [RA, RA] and [RA, HR] right-arrow c × [RA, HR].


    ACKNOWLEDGMENTS

We thank B. Baird, D. Buonomano, C. Linster, A. Krukowski, K. Miller, and members of the Doupe lab for many helpful comments. Special thanks to K. Miller for input and support throughout the project.

This work was supported by the McDonnell-Pew Program in Cognitive Neuroscience (T. W. Troyer), and National Institutes of Health Grants MH-12372 (T. W. Troyer) and MH-55987 and NS-34835 (A. J. Doupe).


    FOOTNOTES

Present address and address for reprint requests: T. Troyer, Dept. of Psychology, University of Maryland, College Park, MD 20742 (E-mail: ttroyer{at}psyc.umd.edu).

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Received 16 February 2000; accepted in final form 2 May 2000.


    REFERENCES
TOP
ABSTRACT
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
REFERENCES

0022-3077/00 $5.00 Copyright © 2000 The American Physiological Society