1Department of Psychiatry, 2Department of Physiology, 3W. M. Keck Center for Integrative Neuroscience, and 4Sloan Center for Theoretical Neurobiology at UCSF, University of California, San Francisco, California 94143-0444
![]() |
ABSTRACT |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Troyer, Todd W. and Allison J. Doupe. An Associational Model of Birdsong Sensorimotor Learning I. Efference Copy and the Learning of Song Syllables. J. Neurophysiol. 84: 1204-1223, 2000. Birdsong learning provides an ideal model system for studying temporally complex motor behavior. Guided by the well-characterized functional anatomy of the song system, we have constructed a computational model of the sensorimotor phase of song learning. Our model uses simple Hebbian and reinforcement learning rules and demonstrates the plausibility of a detailed set of hypotheses concerning sensory-motor interactions during song learning. The model focuses on the motor nuclei HVc and robust nucleus of the archistriatum (RA) of zebra finches and incorporates the long-standing hypothesis that a series of song nuclei, the Anterior Forebrain Pathway (AFP), plays an important role in comparing the bird's own vocalizations with a previously memorized song, or "template." This "AFP comparison hypothesis" is challenged by the significant delay that would be experienced by presumptive auditory feedback signals processed in the AFP. We propose that the AFP does not directly evaluate auditory feedback, but instead, receives an internally generated prediction of the feedback signal corresponding to each vocal gesture, or song "syllable." This prediction, or "efference copy," is learned in HVc by associating premotor activity in RA-projecting HVc neurons with the resulting auditory feedback registered within AFP-projecting HVc neurons. We also demonstrate how negative feedback "adaptation" can be used to separate sensory and motor signals within HVc. The model predicts that motor signals recorded in the AFP during singing carry sensory information and that the primary role for auditory feedback during song learning is to maintain an accurate efference copy. The simplicity of the model suggests that associational efference copy learning may be a common strategy for overcoming feedback delay during sensorimotor learning.
![]() |
INTRODUCTION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The combination of a
well-characterized, stereotyped behavior and specialized anatomy makes
birdsong an ideal system in which to study the neural basis of motor
learning. Moreover, song learning shares important similarities with
human speech learning (Doupe and Kuhl 1999). In birds,
vocal learning is accomplished in two phases. During an initial,
sensory phase, birds listen to and memorize a tutor song,
often called the "template" (Konishi 1965
; Marler 1964
). In a later, sensorimotor phase,
birds gradually match their vocalizations to the memorized song, using
auditory feedback from their own vocalizations (Fig.
1, A and B). We
have constructed a computational model demonstrating that
simple associational (Hebbian) learning rules are sufficient
to address important problems related to the sensorimotor learning of
song. Our model focuses on the zebra finch, a species commonly used in
physiological investigations of song learning. Zebra finch song
consists of a stereotyped sequence of vocal gestures or
"syllables." In this paper, we focus on the learning of the
individual syllables. In the following companion paper (Troyer
and Doupe 2000
), we extend our model to include sequence
learning.
|
The likely neural substrate for sensorimotor learning is the
song system, a set of brain nuclei specialized for vocal learning and
production (Nottebohm et al. 1976) (Fig. 1C).
The motor pathway for song includes the direct projection
from nucleus HVc (used as a proper name; Margoliash et al.
1994
) to the robust nucleus of the archistriatum (RA). Both
nuclei display neural activity time-locked to song production
(McCasland 1987
; Yu and Margoliash 1996
),
and lesions in either nucleus disrupt normal song production at all
stages of development (Nottebohm et al. 1976
;
Simpson and Vicario 1990
). HVc and RA are also connected
by an indirect pathway, the Anterior Forebrain Pathway
(AFP). Lesion studies indicate that the AFP is crucial for song
learning, but is not necessary for normal song production in adults
(Bottjer et al. 1984
; Scharff and Nottebohm
1991
; Sohrabji et al. 1990
). These and other
data (see Biologically supported assumptions) have
led to the "AFP comparison hypothesis," in which the AFP guides
sensorimotor learning by transmitting a comparison between auditory
feedback from the bird's own vocalizations and the memorized
template (Bottjer and Arnold 1986
; Doupe
1993
; Mooney 1992
; Nordeen and Nordeen
1988
; Saito and Maekawa 1993
). These
comparison signals are used to guide learning in the motor pathway at
the level of RA (Fig. 1, C and D).
The AFP comparison hypothesis is challenged by a fundamental
problem in motor learning, the problem of feedback delay
(Lashley 1951; Miall and Wolpert 1996
;
Miles and Evarts 1979
). In zebra finches, the 100-ms
estimated latency (see Fig. 2) for presumptive AFP comparison
signals to arrive in the motor pathway after a motor command is nearly
as long as a typical song syllable. This delay would cause comparison
signals for one syllable to have greatest overlap with the neural
activity for the subsequent syllable and poses a significant challenge
to the notion that AFP comparison signals guide learning in RA (see
Bottjer and Arnold 1986
). In our model, we retain the
hypothesis that the AFP plays an important role in template comparison
but propose that instead of waiting for the actual auditory feedback,
an internal prediction or "efference copy" of the auditory feedback
is generated within HVc to guide song learning. Therefore, we predict
that the signals recorded in the AFP during singing (Hessler and
Doupe 1999a
,b
) are motor signals that also carry
sensory information. Furthermore, our model suggests a functional
reason for why the AFP is located downstream of the motor nucleus HVc
(Fig. 1, C and D): use of an efference copy
requires that brain areas involved in template comparison receive motor efferents.
Preliminary versions of this work have been presented in conference
proceedings (Troyer et al. 1996a,b
).
Model and approach
Over the past 25 years, anatomical, lesion, and in vivo physiology studies have yielded a wealth of data concerning the functional anatomy of the song system. However, current hypotheses regarding the sensory-motor interactions during song learning lack detail. To explore these issues, we set out to build a computational model of the sensorimotor phase of song learning. Our goal was to determine if basic theoretical problems in sensorimotor learning could be solved using simple rules of associational plasticity, constrained by the known anatomy of the song circuit. We hoped to direct future experiments by identifying important gaps in our knowledge, as well as to evaluate previous experimental results from a computational point of view.
Our efforts resulted in two closely related models, addressing the problem of song learning at different levels of abstraction. The first model is a purely "conceptual model," i.e., a self-consistent set of functional hypotheses conforming to a wide range of experimental results. The functional hypotheses contained in this model constitute the core contribution of our research. The second model is a true "computational model" that incorporates these hypotheses into a working computer algorithm. Due to the very limited knowledge of the song system at the level of local circuits, implementing this algorithm required a number of specific assumptions that reach beyond current experimental knowledge. As a result, several aspects of the computational model are not well-constrained by biology. Moreover, we made a number of simplifying assumptions to ensure that simulations could be run in a reasonable amount of time. However, the computational model played an important role in exploring our initial functional ideas and serves to illustrate our core conceptual hypotheses. Perhaps more importantly, the construction of a working computational algorithm demonstrates the mutual consistency of our hypotheses, as well as providing a theoretical demonstration that they are sufficient to account for important aspects of song learning. This dual approach not only highlights general problems of sensorimotor learning and generates testable predictions at a functional level, it also provides a framework for understanding how specific biological mechanisms may contribute to their solution. These models are only a first step, and, of necessity, contain many simplifications. However, taken together, they constitute the most detailed set of hypotheses to date regarding the interaction of sensory and motor signals during the sensorimotor phase of song learning.
In this section, we present the justification for our working biological assumptions. We then describe the main problems addressed by our model and outline the key elements of our proposed solution. Finally, we present our conceptual model, which describes our functional hypotheses in greater detail. In the METHODS section, we outline the theoretical assumptions incorporated into our working computational model, including a description of the network architecture and the simple encoding scheme used to represent song. In the RESULTS section, we present quantitative results generated by our computational model. Details of the computer algorithm are confined to an APPENDIX.
Biologically supported assumptions
Although the nature of template memorization is largely unknown,
various lines of evidence suggest that the AFP may transmit a
comparison between the bird's own vocalizations and the memorized tutor song. We call such signals "template comparison signals." Initial evidence suggesting a role for the AFP in template comparison came from lesion experiments: AFP lesions in juvenile zebra finches disrupt song learning, whereas lesions in adult birds have little effect on normal song production (Bottjer et al. 1984;
Nottebohm et al. 1976
; Scharff and Nottebohm
1991
; Sohrabji et al. 1990
). Further experiments
have shown that the lateral portion of the magnocellular nucleus of the
anterior neostriatum (LMAN), the output nucleus of the AFP,
appears to be necessary any time the song changes, even in adulthood
(Brainard and Doupe 2000
; Morrison and Nottebohm
1993
; Williams and Mehta 1999
). Other
experiments suggest that circuitry within the AFP may function as a
template: AFP neurons develop song selective auditory responses during
song learning (Doupe 1997
; Solis and Doupe
1997
), and a subset of these neurons respond vigorously to the
tutor song (Solis and Doupe 1997
, 1999
). Using a more
direct approach, Basham et al. (1996)
showed that local
blockade of N-methyl-D-aspartate (NMDA) receptors in the
AFP specifically during song memorization disrupts normal song learning.
Within the framework of our model, the simplest hypothesis is that the AFP not only transmits a template comparison signal, but that it also computes the match between the efference copy and the memorized template, i.e., the AFP is the storage site for the tutor template. We did not attempt to model the AFP circuitry that subserves template comparison but rather viewed the AFP as a "black box" that performs the necessary calculations. An alternative hypothesis that is still consistent with the basic structure of our model is that the AFP transmits a template comparison signal, but that memorized template information is stored closer to the auditory periphery than the AFP (see DISCUSSION).
Additional studies into the functional anatomy of the song system have
shown that the neurons that project to RA and those that project to the
AFP form distinct populations within HVc (Nordeen and Nordeen
1988). We denote these two populations HVc_RA and HVc_AFP.
While the evidence is indirect, these two populations are likely to be
highly interconnected (Fortune and Margoliash 1995
;
Vu and Lewicki 1994
). Various data suggest that activity within HVc_RA neurons is more closely tied to motor behavior, whereas
activity within HVc_AFP neurons is more closely tied to auditory input
(Katz and Gurney 1981
; Kimpo and Doupe
1997
; Lewicki 1996
; Saito and Maekawa
1993
; but see Doupe and Konishi 1991
; Vicario and Yohay 1993
). Moreover, experiments in
singing birds suggest that the motor pathway is arranged
hierarchically, with RA encoding the detailed motor program for each
song syllable, and the central pattern generator for song sequence
lying upstream of RA, perhaps in HVc (Vu et al. 1994
;
Yu and Margoliash 1996
).
The main biologically supported assumptions that are incorporated into the model are summarized in Table 1.
|
The final data included in the model were the estimated latencies
between various song nuclei (Fig.
2A). We included only the best
studied neural pathways in the song system, as the functional significance of other signaling pathways remains unclear (see Foster and Bottjer 1998; Foster et al.
1997
; Striedter and Vu 1998
; Vates et al.
1997
). We used 50 ms for the latency from HVc premotor activity
to vocal output (McCasland 1987
; McCasland and Konishi 1981
), and 15 ms for auditory latencies to HVc
(Margoliash and Fortune 1992
). Estimating the processing
time through the AFP during song was more problematic, since activity
in LMAN, the output nucleus of this pathway, is quite variable. We used 45 ms for the latency to LMAN (A. J. Doupe 1997
; personal
observations). Subtracting 15 ms for the latency to HVc and adding 10 ms for the delay between LMAN and RA, we obtained a processing time
through the AFP of roughly 40 ms. Simulated syllables were 80-ms long with a 35-ms gap between syllables (Fig. 2B), typical of
mean values for zebra finch song (M. Brainard, personal communication; Scharff and Nottebohm 1991
; Zann 1993
).
These timing data suggest that, on average, presumptive template
comparison signals from the AFP will have the greatest overlap with
motor activity for the subsequent syllable (Fig. 2C, dotted
box).
|
Problems addressed
In this paper, we address the problem of learning a collection of motor representations corresponding to song syllables stored within a memorized template. For simplicity, we do not address learning the detailed temporal structure within each syllable, nor learning the length of syllables and inter-syllable gaps. Our model rests on two key assumptions: 1) song learning is accomplished using simple associational learning rules and 2) the AFP guides song learning by transmitting a signal that carries information about the match between the bird's auditory feedback and a stored template. Here, we present a brief outline of the main problems addressed by our model and the key functional hypotheses that underlie our solutions (see Table 2). More detail regarding our hypothesized solutions is presented in the form of a conceptual model (see Conceptual model) and a computational model (see RESULTS). The presentation of both models is structured according to the following outline.
|
The first problem we address is the important problem of auditory
feedback delay: presumptive AFP comparison signals would arrive in RA
during the neural activity for the next syllable (Fig.
2C). We hypothesize that the AFP does not directly evaluate auditory feedback, but instead, receives an internally generated prediction of the sensory feedback resulting from song-related motor
activity (Table 2, number 1). Such an internal prediction requires a
transformation from motor to sensory coordinates and has been termed
efference copy (Sperry 1950), "corollary discharge" (von Holst and Mittelstaedt 1980
), or the result of a
"forward model" (reviewed in Jordan 1995
;
Miall and Wolpert 1996
). We will use the term efference
copy. Sensory signals resulting from motor behavior have been termed
sensory "reafference" (von Holst and Mittelstaedt
1980
). We further hypothesize that the motor
sensory efference copy develops between the two populations of HVc projection neurons (Table 2, number 2). To learn this mapping, it is important that our associational plasticity rule is "temporally asymmetric," i.e., presynaptic activity must be followed by postsynaptic
activity to induce plasticity (Table 2, number 3).
The second problem we address is the nature of AFP-guided syllable
learning in RA. We make two functional hypotheses. First, we
hypothesize that syllable learning is guided by nonspecific reinforcement signals provided by the AFP that modulate the degree of
ongoing associational plasticity throughout RA (Table 2, number 4; see
Sutton and Barto 1998, for an overview of reinforcement learning). This hypothesis is motivated by the fact that nonspecific reinforcement signals, while generated by a match to a sensory template, do not have to be directed toward specific patterns of RA
motor neurons. As a result, no sensory
motor mapping is required to
guide learning. Second, we hypothesize that synapses intrinsic to RA
play an important role in storing syllable representations (Table 2,
number 5). This hypothesis was motivated by the need to learn a number
of discrete patterns of neural activity corresponding to the syllables
in the tutor template and is consistent with estimates that up to 85%
of synapses in RA come from local collaterals of other RA neurons
(Herrmann and Arnold 1991
). Theoretical models have
shown that recurrent activity is ideal for stabilizing such patterns
(e.g., Hopfield 1984
). Moreover, if the representation for individual syllables is encoded in the pattern of intrinsic RA
synapses, plasticity in the synapses connecting HVc and RA can alter
the sequence of syllables produced, with only minor disruption to the
representation for each individual syllable (see Troyer and
Doupe 2000
).
The third problem we address results from the competing requirements of both learning and using the efference copy signal. Learning an efference copy mapping by associating motor activity with delayed auditory feedback implies that auditory inputs induce significant levels of activity. However, when using the short-latency efference copy signals to guide syllable learning, the strong auditory inputs will interfere with the efference copy signal. We address this problem by assuming that the auditory feedback signal is relatively weak and/or that the response of HVc_AFP neurons is strongly adapting (Table 2, number 6).
Conceptual model
Our model focuses on four neural populations (Fig.
3): nucleus RA in the motor pathway,
separate populations of HVc projection neurons projecting to RA and the
AFP (Nordeen and Nordeen 1988), and a single population
representing the output of the AFP. Because we do not explicitly model
nuclei downstream of RA, activity in RA represents the motor output of
the model. In this paper, we explore the functional consequences of
associational plasticity in three sets of connections: HVc_RA
HVc_AFP, HVc_RA
RA, and intrinsic RA
RA connections.
|
Our model does not address the learning of syllable timing. We assume
that timing is provided by rhythmically clocked bursts of premotor
activity arriving in HVc_RA, with the duration of each burst
controlling the duration of premotor activity and hence the length of
song syllables (Fig. 2B). While the source of the premotor
drive is not explicitly modeled, the song nuclei nucleus uvaeformis (Uva) and/or nucleus interfacialis (NIf) are likely candidates (McCasland 1987; Striedter and Vu
1998
; Williams and Vicario 1993
). Input
from the forebrain nucleus medial MAN is also a possible source.
Although the timing of this drive is fixed, we assume that
HVc_RA neurons receive varying magnitudes of drive, and these
magnitudes are generated independently for each HVc_RA neuron and
each vocalization produced by the model. Thus, HVc_RA produces
random patterns of premotor activity that are independent from one
syllable to the next. The model's task is to use template comparison
signals generated by the AFP to reorganize the connections in
the motor pathway so that 1) random HVc_RA activity is
converted into a handful of stereotyped patterns of RA motor
activity, and 2) these stereotyped patterns of RA activity
lead to vocal output matched to the memorized template. Note that
HVc_RA activity becomes ordered when we address the problem of
sequence learning (Troyer and Doupe 2000
).
PROBLEM 1: AUDITORY FEEDBACK DELAY.
To address the problem of feedback delay, we hypothesize that an
efference copy mapping is learned between the two populations of
HVc projections neurons (Table 2, numbers 1 and 2). Since the
connections in the motor pathway are initially unstructured, the random
patterns of HVc_RA activity lead to a random exploration of motor space
(cf. Bullock et al. 1993; Kuperstein
1988
; Salinas and Abbott 1995
). Activity flows
down the motor pathway (McCasland 1987
) and returns to
HVc_AFP as auditory feedback (Fig.
4A, dark lines). While the
exact form of the learning is not crucial for our model, it is
important that associational learning is temporally asymmetric (Table
2, number 3), i.e., synaptic strengths increase only when
presynaptic activity precedes postsynaptic activity (Bi and Poo
1998
; Debanne et al. 1998
; Gustafsson et
al. 1987
; Hebb 1949
; Markram et al.
1997
). By strengthening synapses onto neurons that are likely
to fire in the near future, temporally asymmetric "Hebbian"
learning strengthens synaptic inputs that "anticipate" any
postsynaptic activity that regularly follows presynaptic spiking (cf.
Blum and Abbott 1996
; Gerstner and Abbott 1997
). In our model, auditory feedback to HVc_AFP neurons
encoding the sensory aspects of a particular vocal gesture will follow spiking in HVc_RA neurons encoding motor aspects of that gesture. Associational learning then strengthens the synapses from that (presynaptic) HVc_RA neuron onto the corresponding (postsynaptic) neurons in HVc_AFP (Fig. 4A, white arrow). After this motor
sensory mapping is learned, activity within HVc_RA motor neurons will drive, with short latency, the HVc_AFP neurons encoding the corresponding sensory representation. This short-latency motor activity
in HVc_AFP constitutes a sensory prediction of the auditory reafference. This efference copy can then be passed on to the AFP and
used to guide learning in RA. Note that efference copy learning occurs
within HVc and proceeds without reference to the tutor template stored
in the AFP. Using efference copy in this way splits the total feedback
delay for AFP comparison signals to return to RA into two shorter
delays: the auditory feedback delay of 65 ms to HVc (Fig.
4A) and the 40-ms processing delay from HVc through the AFP
(Fig. 4B).
|
PROBLEM 2: SYLLABLE LEARNING IN RA.
To guide syllable learning, the AFP evaluates the efference copy
and transmits a reinforcement signal to RA (Table 2, number 4).
This nonspecific reinforcement signal is assumed to modulate the degree
of ongoing associational plasticity throughout RA. An efference copy
that is well-matched to the tutor song results in a large plasticity
signal in RA neurons that are significantly activated, leading to a
potentiation of recently activated synapses; a poor match evokes small
potentiation or depression. Since a good match to the tutor song occurs
when the RA neurons that encode a single tutor syllable are co-active,
reinforcement leads to the development of strong connections between RA
neurons encoding the same tutor syllable (Table 2, number 5).
Reinforcement also reorders the connections from HVc_RA RA (see
RESULTS). These patterns of connectivity result in a strong
tendency for RA to produce coherent patterns of motor activity matched
to the template, i.e., the tutor syllables have become "attractors"
for the neural dynamics within RA (see, e.g., Amit
1989
).
PROBLEM 3: SEPARATING MOTOR AND SENSORY SIGNALS IN HVC.
In our model, HVc_AFP neurons receive two distinct inputs:
auditory feedback, which drives efference copy learning, and motor input from HVc_RA, which carries the efference copy used for AFP-driven song learning. While necessary for efference copy learning, the delayed
auditory signal can interfere with the efference copy signal used to
guide learning. We propose two strategies for separating sensory and
motor signals within HVc_AFP (Table 2, number 6). First, the auditory
feedback signal is set significantly weaker than the efference copy
signal. Hence, auditory feedback only weakly perturbs the efference
copy, which can remain sufficiently accurate to guide syllable
learning. However, weak auditory feedback is able to guide efference
copy learning by providing, over the course of multiple syllables, a
consistent association between HVc_RA motor activity and the resulting
weak sensory activation. The second strategy is based on the
cancellation of auditory feedback signals in HVc_AFP by
"adaptation." Specifically, adaptation in the HVc_AFP circuitry
results in a "negative after-image" of any given pattern of HVc_AFP
activity (Fig. 5), which has a decay time
(100 ms) similar to the length of a typical song syllable (see
APPENDIX for implementation). A variety of biological
mechanisms could provide this kind of adaptation, e.g., spike-triggered
or voltage-dependent intrinsic currents and/or slow feedback
inhibition. Such mechanisms have been shown to be present within HVc
(Dutar et al. 1998; Kubota and Saito
1991
; Kubota and Taniguchi 1998
; Schmidt
and Perkel 1998
). Because the efference copy arrives in HVc_AFP
with a shorter delay than the auditory feedback, the after-image of the
efference copy will counteract the corresponding auditory reafference.
That is, HVc_AFP neurons strongly activated by efference copy input
from HVc_RA will be in an adapted state by the time that the
corresponding patterns of delayed auditory feedback arrive in HVc_AFP.
Note that an inaccurate efference copy will lead to an incomplete
cancellation of auditory feedback, and interference from this delayed
feedback will create an inaccurate efference copy. However,
associations between the uncanceled feedback signal and the HVc_RA
motor activity that gave rise to it will lead to new plasticity that
improves the quality of future efference copy predictions. Details of
how this cancellation mechanism works in the context of our computer
algorithm are presented in the RESULTS.
|
![]() |
METHODS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
The main assumptions that were necessary to construct our computational algorithm are summarized in Table 3 and are discussed below. Only the subsections explaining our method of neural encoding (see Neural encoding, Fig. 6) and the nature of HVc_AFP activity (see Tonic activity patterns, Fig. 7) are necessary for understanding the main computational results presented in the RESULTS. Other subsections describe issues of mainly theoretical interest. In the final subsection of the METHODS, we provide formulas for our method for characterizing the developmental time course in the model. Details of the computational algorithm are presented in the APPENDIX. The assumptions outlined in Table 3 are not crucial for the main predictions of our model; alternative algorithms that implement our functional hypotheses for song learning are possible. Our particular algorithm should be seen as a first approximation, one that allows us to explore associational learning between patterns of sensory and motor activity on the time scale of tens to hundreds of milliseconds.
|
|
|
Each simulation consisted of repeated iterations of a computer subroutine that 1) calculated activity patterns related to a single syllable output by the model, 2) applied our synaptic plasticity rule, and 3) updated the various homeostatic mechanisms in the model. The details of the algorithm and the specification of model parameters are given in the APPENDIX. In most simulations, the subroutine was iterated for 25,000 syllables, ~5,000 more than were typically needed for model output to become stereotyped. When performance was degraded by changing parameters (see APPENDIX), simulations were extended to 50,000 syllables, but output sometimes lacked stereotypy. Computer simulations were written using the MATLAB simulation environment (version 5.3; The Mathworks, Natick, MA). Typical simulations took ~2 h when run using a 400-MHz Pentium II processor.
Neural encoding
Activity in the model was represented by the output of a number
of neural "units." Each of these units is meant to represent the
activity within a network of connected neurons or "cell assembly" (Hebb 1949). Hereafter, we will use the term
"assemblies." Given the lack of data concerning the neural code for
vocal gestures in the song system, we sought the simplest encoding
scheme that could support associational learning (Table 3, number 1).
Each vocal gesture produced by the model is viewed as a combination of
40 abstract "vocal features," with each RA assembly representing motor-related aspects of one feature, and each HVc_AFP assembly representing sensory-related aspects of one feature. Because of this
one-to-one mapping of auditory and motor features, motor activity in a
given RA assembly leads to auditory feedback input to the unique
corresponding assembly in HVc_AFP. The tutor song consisted of five
syllables, within the normal range for zebra finch song (3-9;
Price 1979
). We denote these syllables by the letters
A-E and assumed that each tutor syllable was encoded by a distinct set
of assemblies, allowing us to number vocal features consecutively,
i.e., tutor syllable A contains vocal features 1-8, tutor syllable B
contains features 9-16, etc. (Fig. 6A). The tutor template
is stored in the AFP, with tutor syllables encoded in the connections
from HVc_AFP: each AFP assembly corresponds to a single tutor syllable
and receives input from the HVc_AFP assemblies representing the
auditory features comprising that syllable (Fig. 6B).
Connections related to syllable B are shown as an example. Our choice
of this very simple representation was guided by the following
considerations: 1) due to the complexity of the network and
finite computational resources, our model contains only a limited
number of assemblies; 2) since learning correlated patterns
with Hebbian learning rules is a largely unsolved theoretical problem,
we chose an encoding scheme in which uncorrelated patterns of motor
activity result in uncorrelated patterns of sensory feedback; 3) our encoding scheme ensures decorrelation in the motor
sensory mapping even for assemblies using nonlinear input-output functions.
Initially, all connections in the motor pathway are unstructured. Thus, random activity in HVc_RA leads to random motor activity in RA (Fig. 6C). The model's task is to 1) compare sensory signals with the stored template in the AFP to guide plasticity within the motor pathway, and 2) use these signals to guide plasticity in the motor pathway so that random HVc_RA activity is converted to stereotyped patterns of RA activity matched to the tutor song.
Tonic activity patterns
For simplicity, we assume that song-related activity is encoded
by the neural firing rates averaged over the course of each song
syllable. Thus, the activity within each of the four neural populations
is modeled as a vector of firing rates, with one entry for each
assembly in the population. For all populations except HVc_AFP, firing
rates are assumed to be constant during the period of premotor drive
for each syllable and zero during the gap between syllables. In
HVc_AFP, we divided each syllable into four time epochs depending on
the combination of efference copy (related to the current syllable) and
auditory feedback input received during that syllable (Fig. 7). During
the early part of each syllable (marked E), HVc_AFP receives
efference copy input from HVc_RA that relates to the current syllable,
while the sensory input is due to delayed auditory feedback from the
previous syllable. The middle portion of each syllable (marked
M) corresponds to the period of silence in the delayed
feedback. During this period, HVc_AFP receives efference copy input
only. During the late part of the syllable (marked L), the
efference copy and auditory inputs correspond to the same syllable.
Finally, during the "gap" period between bursts of HVc_RA activity
(marked G), HVc_AFP receives only auditory input. During the
epochs when efference copy and auditory feedback inputs overlap, the
two sources of input were simply summed. For computational and
conceptual simplicity, we chose not to propagate this subdivision of
activity to the AFP. The efference copy activity that was passed on to
the AFP was calculated from the average activity in HVc_AFP during the
early and middle portion of the syllable. Late and gap portions were excluded for the following reasons. RA activity generated during the
current syllable contributes to the late and gap portion of HVc_AFP
activity. In our sequence learning model (Troyer and Doupe 2000), the AFP not only provides a reinforcement signal to RA, but also affects the pattern of RA activity. Excluding the late and gap
portions of HVc_AFP activity from the efference copy prevents RA output
from contributing to RA input during the same syllable via the RA
HVc_AFP
AFP
RA feedback loop. It also prevents auditory
feedback from the current syllable from contributing acutely to the AFP
reinforcement signal. We will view the combined early and middle
activity signal as the efference copy passed on to the AFP, although it
may include auditory feedback from the previous syllable.
Plasticity rule
We used a simple model of associational learning. Synaptic
projections are in principle "all-to-all," i.e., associational learning takes place between all relevant combinations of pre- and
postsynaptic assemblies. Assemblies become functionally
disconnected when associational learning drives connection strengths to
zero. While our learning rule is meant to encompass the many potential mechanisms of associational plasticity in the song system, the form of our learning rule is based on analogies with NMDA
receptor-dependent long-term potentiation (LTP; Malenka and
Nicoll 1993; Table 3, number 3). In the equation below
we use rpre(t) and
rpost(t) to denote the
activity level of the pre- and postsynaptic assemblies at time
t. Each presynaptic spike (at time
tpre) was assumed to give rise to a
postsynaptic "plasticity trace,"
, analogous to the amount of
NMDA-receptor binding. The shape of the function
determines the
time window for neural plasticity (see APPENDIX). This
plasticity trace is multiplied by postsynaptic activity to yield a
"plasticity signal,"
(t
tpre)rpost(t),
analogous to postsynaptic calcium concentration. Input from the AFP is
assumed to give a reinforcement signal R that modulates the
plasticity signal in all RA assemblies. (R is set to a
constant value of 1 in HVc.) Plasticity signals above a threshold value
increase synaptic strength (LTP); signals below
give rise to
long-term depression (LTD; Cummings et al. 1996
;
Hansel et al. 1997
; Lisman 1989
).
is
a "sliding threshold" that depends on the average amount of
activity in the postsynaptic cell (Abraham and Bear
1996
; Bienenstock et al. 1982
; Sejnowski
1977
). Thus, the change in synaptic strength resulting from
postsynaptic activity at time t and presynaptic activity at
time tpre is proportional to the
following quantity (see APPENDIX)
![]() |
![]() |
Local circuit mechanisms
Activity within each neural population was based on very simple
local circuitry. The output of each excitatory cell assembly was
computed as a linear function of its input after subtracting a
threshold value. RA included intrinsic excitatory connections that were
used to store syllable representations in a manner analogous to other
associative memory models or so-called attractor networks (Table 3,
number 4A; Amit 1989). To minimize computation, only RA
includes such connections. Each population also includes a single
inhibitory assembly that is connected to all assemblies within the
corresponding population (Table 3, number 4B). Inhibition is of two
basic types. HVc_RA, HVc_AFP, and the AFP use "feedforward inhibition," in which inhibitory activity is equal to the average afferent input received by the population, minus a
threshold. RA uses "feedback inhibition," in which inhibitory
activity is driven by the average activity within the local
population. Feedback inhibition allows tighter control of the activity
in the local network but is computationally more expensive. Within a
given population, all excitatory assemblies receive a similar level of
inhibition. Since the only assemblies that get strongly activated are
those that receive enough input to overcome this inhibition, inhibition
mediates a form of "competition" among excitatory assemblies.
Decorrelating initial connectivity
Our simulations were designed to determine whether associational
learning, guided by template matching signals from the AFP, could
organize initially unstructured connections in the motor pathway to
produce the stored tutor song. The dominant computational problem
encountered in building the model was the positive feedback inherent in
associational learning rules: correlated activity increases synaptic
strength, which tends to further strengthen the correlation. Left
unchecked, this learning will continually amplify initially weak
associations, even spurious associations resulting from chance events.
One of the most important factors contributing to spurious correlations
was the limited size of our network simulations. The strength of random
correlations is highly dependent on network size, roughly decreasing
with the square root of the number of network units. Because of the
computational expense of simulating intrinsic feedback dynamics within
RA, we limited the number of RA assemblies to 40 (Fig. 6B).
Independently choosing each connection within such a network will
result in correlations that are an order of magnitude stronger than
those expected in a more realistically sized network containing 4000 assemblies. The calculation of HVc_RA activity was computationally less
expensive, and a larger number (5 × 40 = 200) of HVc_RA
assemblies was included. While reducing correlations to some degree,
these numbers still do not approach physiologically realistic numbers. We note here that the greater storage capacity of larger networks resulting from a reduction in random correlations (Amit
1989) may relate to reports of a relationship between the size
of various song nuclei and the number of song syllables learned
(reviewed in Brenowitz 1997
; Nordeen and Nordeen
1997
).
To address the problem of correlated connections, we chose initial
patterns of connectivity specifically aimed at minimizing these
correlations (Table 3, number 5). Initial connection strengths were
chosen according to two basic strategies. For HVc_RA RA and HVc_RA
HVc_AFP connections, we used a "single-projection" strategy, in
which each presynaptic assembly connects with a single postsynaptic
assembly. This ensures that the levels of input received by any two
assemblies in the postsynaptic population are independent. However, the
single-projection strategy does not prevent correlations arising from
polysynaptic pathways within the recurrent circuitry in RA. For these
intrinsic RA connections, we used a "uniform" strategy, in which
each presynaptic assembly connects with all postsynaptic assemblies
with equal strength. This ensures that all correlations result from a
global signal shared by all assemblies. While such a signal will
increase overall synaptic strengths, it will not lead to spurious
patterns of correlations within the network. To ensure that
our model was robust to some degree of correlation, zero-mean Gaussian
perturbations were added to all plastic connections during the
initialization process. The standard deviation of the perturbations was
set to 10% of the strength of the nonzero synapses. After the
perturbation, negative strengths were set to zero. Noise was not added
to the three projections that did not undergo plasticity (the premotor
drive, auditory feedback, and template storage connections from HVc_AFP
to the AFP).
Homeostatic mechanisms
In addition to decorrelating initial connectivity patterns, we
include two sources of homeostatic negative feedback to
counteract the positive feedback inherent in associational learning
(Table 3, number 6). The first is a normalization of synaptic strength: after applying associational change for each simulated syllable, the
strengths of all synapses onto (or from) a given assembly are
multiplied by a single number so that the total amount of postsynaptic
(or presynaptic) strength for any one assembly remains nearly constant
(see APPENDIX). This kind of multiplicative normalization controls total synaptic strength without altering the relative magnitude of the individual connections. Presynaptic normalization was
applied before postsynaptic normalization (see APPENDIX).
The strengths to which synaptic connections were normalized were chosen by hand so that 1) intrinsic RA circuitry contributed a
large component (50%) of the input to RA assemblies, and 2)
auditory feedback contributed a modest portion (20%) of the input to
HVc_AFP. The mechanisms underlying homeostasis are just now beginning
to receive focused attention. Multiplicative normalization of synaptic strength has been shown by Turrigiano et al. (1998) and
was hypothesized to depend on mean levels of activity. An approximation
to our postsynaptic normalization rule follows if mean levels of
activity (calculated on long time scales) are related to total
excitatory strength synapsing on that neuron. Mechanisms such as
conservation of transmitter released and/or retrograde trophic factors
could underlie presynaptic normalization.
The second source of negative feedback is inhibitory plasticity that is
homeostatic, i.e., if an excitatory assembly becomes too active, the
inhibitory connection onto that assembly is strengthened (Rutherford et al. 1997; see APPENDIX). We
note that controlling feedback in the model was not always
straightforward, since oscillatory instability results if negative and
positive feedback mechanisms operate on similar time scales.
Quantifying learning time course
To quantify the learning time course, we divided the model
output into 250 syllable epochs and computed the matrix
Mact of co-fluctuations in activity
between each pair of RA assemblies over each epoch. During the
mth epoch
![]() |
![]() |
The CC was also used to monitor the connectivity appropriate for the
efference copy mapping and for the intrinsic RA connectivity that
underlies syllable encoding. For the efference copy, we measured the CC
between the matrix of motor sensory connection strengths from
HVc_RA
HVc_AFP and the HVc_RA
RA connections in the motor pathway. To quantify the development of syllable-based connectivity, we
calculated the CC between Msyl and the
matrix of intrinsic RA connection strengths, again excluding diagonal entries.
![]() |
RESULTS |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In presenting the results of our computational model, we focus on the quantitative data produced by a single representative simulation. This allows a step-by-step illustration of song development and demonstrates the mutual consistency of the functional hypotheses described above. After presenting these results, we show how the model reacts to changes in important parameters.
Problem 1: Auditory feedback delay
The first step in our proposed solution to the problem of feedback
delay is the learning of an efference copy mapping. At the beginning of
each simulation, connections in the motor pathway are unstructured and
random HVc_RA activity leads to random patterns of RA output (Fig.
6C). Efference copy learning results from associations between the random patterns of HVc_RA activity and HVc_AFP activity induced by auditory reafference (Fig. 4A). We examined the
development of an efference copy map in two ways (Fig.
8). First, an accurate map should cause
efference copy activity to match the auditory reafference. Figure
8A shows the pattern of vocal output (left column of
each pair, marked V) and HVc_AFP efference copy activity (right column of each pair, marked EC) for five syllables
spanning the period of initial efference copy learning. Note that,
because of our simple encoding scheme, vocal output, RA motor activity, and auditory feedback have equivalent representations (see
METHODS). Initially, both patterns of activity are highly
distributed and unrelated (Fig. 8A, left pairs).
As efference copy learning progresses, the activity remains distributed
(significant syllable learning has not taken place), but the efference
copy activity is highly correlated with the vocal output (Fig.
8A, right pair). Note that a perfect match is not
required (Jordan and Rumelhart 1992); the efference copy
estimate only has to be accurate enough so that, on average, the AFP
will reinforce the proper correlations in RA (see Fig.
9, bottom). An accurate
efference copy mapping can also be measured by determining the
similarity between the mapping of HVc_RA onto motor features in RA and
the efference copy mapping of HVc_RA onto sensory features in HVc_AFP.
Figure 8B shows the correlation coefficient (see
METHODS) between the connection strengths from HVc_RA
RA and those from HVc_RA
HVc_AFP. By syllable 500, efference copy
correlation has reached 0.81, 84% of the maximum value (0.96) reached
during the simulation.
|
|
Problem 2: Syllable learning in RA
CALCULATION OF THE REINFORCEMENT SIGNAL. The AFP guides syllable learning by transmitting a nonspecific reinforcement signal that uniformly modulates plasticity in all RA assemblies. To calculate the match to the template, each AFP assembly sums the input from HVc_AFP assemblies encoding a distinct tutor syllable (Fig. 6B). The competition mediated by mutual inhibition in the AFP ensures that significant activation of the AFP occurs only if HVc_AFP activity is mostly confined to assemblies corresponding to one (or a few) tutor syllables. The final reinforcement value was obtained by thresholding each AFP assembly's output and summing these thresholded outputs (see APPENDIX for details).
The outcome of this procedure is shown in Fig. 9. Figure 9A shows the vocal output (marked V) and efference copy (marked EC) for 11 consecutive syllables sung during the period of syllable learning. The black bars show the reinforcement signal. This reinforcement is obtained from evaluating the HVc_AFP efference copy activity on the right of each column but is used to modulate associational learning for the RA motor activity generating the vocal output shown on the left. Large reinforcement is obtained when efference copy activity is concentrated within assemblies encoding a single tutor syllable (e.g., syllable 11,006 and 11,009). Smaller reinforcement signals are computed when HVc_AFP activity is distributed among assemblies encoding two syllables (e.g., syllables 11,000, and 11,003). Note that the 11,007th syllable produced by the model was dominated by the motor assemblies encoding D, but the AFP signaled minimal reinforcement because of an inaccurate efference copy representation.SYNAPTIC REORGANIZATION.
Reinforcement-guided syllable learning is shown in Fig.
10. Initially, RA RA connection
strengths were set to be nearly equal (A,
middle), minimizing the presence of randomly correlated
connections that would have to be "unlearned" (see
METHODS). Note that self-connections are not included in
our model (diagonal entries are zero), since strong self-correlations
would tend to dominate associational learning. Unstructured input from
HVc_RA (A, left) resulted in random patterns of
RA activity (A, right). Because AFP-mediated reinforcement 1) is greatest when assemblies corresponding
to a common tutor syllable are co-active, and 2) results in
large increases in synaptic strength onto active RA assemblies, RA
assemblies began to develop strong connections with other RA assemblies
encoding the same syllable (B, middle).
Reinforcement also guided learning within the projection from HVc
RA, causing RA assemblies encoding the same tutor syllable to receive
input from similar sets of HVc_RA assemblies and thus to receive
correlated patterns of HVc input (B, left). Both
the recurrent circuitry and HVc_RA input led to RA activity partially
matched to the tutor syllables (B, right). After
learning was complete, HVc_RA input was a mixture of tutor syllable
representations (C, left). Strong intrinsic circuitry (C, middle) amplifies the activity
within assemblies encoding the most strongly driven syllable, and
inhibitory competition suppresses other responses (see
METHODS and APPENDIX). As a result, the model
produced motor output perfectly matched to the syllables in the tutor
song (C, right). Because HVc_RA continues to be
driven by the random premotor drive, syllables are produced in a random sequence. Sequence learning will be addressed in our companion paper
(Troyer and Doupe 2000
).
|
TIME COURSE OF LEARNING. The developmental time course of song learning in our model is shown in Fig. 11. To quantify convergence toward the tutor song, we first computed an "ideal" syllable covariance matrix, Msyl. This is a 40 × 40 matrix containing the covariance in the level of activity between each pairing of the 40 RA assemblies (sampled over a number of consecutive syllables), where it is assumed that the model is producing a perfect rendition of the tutor song. Msyl has strong positive entries for pairs of assemblies belonging to the same tutor syllable and negative entries for pairs belonging to different syllables. We then divided the model output into 250 syllable epochs and computed the matrix of co-fluctuations in activity between each pair of RA assemblies over each epoch. Convergence toward the tutor song was quantified by computing the correlation coefficient between the entries in Msyl and those in the co-fluctuation matrix (Fig. 11, solid line). For detailed definitions of these calculations, see METHODS, Quantifying learning time course. We also computed the correlation coefficient between the pattern of RA connectivity and Msyl (Fig. 11, dashed line). The development of intrinsic RA connectivity is mirrored by the appearance of the corresponding correlations in RA activity.
|
Problem 3: Separating motor and sensory signals in HVc
HVc_AFP receives two functionally distinct sets of inputs:
efference copy inputs from HVc_RA and auditory feedback (Fig. 7). The
unmixing of signals is addressed in our model by 1) using weak feedback, and 2) including "adaptation" in HVc_AFP
(Fig. 5). The action of the HVc_AFP adaptation mechanism is shown in Fig. 12. Excitation within HVc_AFP
assemblies recruits a negative current that decays exponentially (Fig.
12A, bottom). When the efference copy input from
HVc_RA correctly predicts the pattern of auditory feedback,
adaptation (open arrow) counteracts the delayed auditory feedback
(black arrow) and prevents inappropriate activity during the early
portion of the next syllable (x). Adaptation also prevents
auditory inputs from driving HVc_AFP activity during the gap portion
between syllables (McCasland and Konishi 1981). If the
efference copy does not predict the level of auditory feedback, the
feedback will not be canceled and this will result in an inaccurate efference copy during the subsequent syllable (not shown). However, associational learning triggered by such an "error response" in HVc_AFP will strengthen the connections from the HVc_RA assemblies that
drove the strong auditory feedback, thereby improving the accuracy of
future efference copy predictions. We quantified the efficacy of the
cancellation mechanism by calculating the correlation coefficient
between the pattern of HVc_AFP activity during the early portion of
each syllable and the auditory feedback from the previous syllable.
Initially, this quantity is small and positive, since auditory feedback
is weak relative to the input from HVc_RA (Fig. 12B, solid
line). As the efference copy map is learned (dashed line), the
adaptation cancellation mechanism reduces the correlation with the
auditory feedback signal to values approximately equal to zero.
|
Range of model behavior
Our results demonstrate the plausibility of our hypothesis that an efference copy mapping is used to guide the sensorimotor learning of birdsong. A detailed assessment of how our model reacts to changes in its multiple parameters is beyond the scope of this paper. Here, we briefly demonstrate the model's robustness using default parameter values and describe the most common types of breakdown in model behavior. Results are shown for alterations in four parameters: 1) the learning rate; 2) the level of correlation within the initial pattern of synaptic connectivity; 3) LTP/LTD threshold in RA; and 4) the strength of connections onto RA assemblies.
To test the robustness of the model, we ran the algorithm 10 times with parameters fixed at their default values. Simulations differed in the random seeds used to determine the initial synaptic connection strengths and the sequence of random premotor drives. The time course of learning in these simulations is shown in Fig. 13. Syllable learning followed a similar time course in all simulations, eventually resulting in accurate reproductions of the template syllables.
|
The most difficult problem encountered in constructing a working algorithm was the instability of learning due to the positive feedback inherent in associational plasticity, i.e., correlated activity leads to stronger connections, which in turn lead to more strongly correlated activity. This is a general and largely unresolved theoretical problem; in our model we addressed it using a variety of negative feedback mechanisms (see METHODS). When syllable learning did break down, two general types of errors were most common. First, spurious correlations caused errors during learning. For example, faster learning rates led to more rapid synaptic change, making spurious correlations more prominent. These spurious correlations were then amplified by associational learning. Alternatively, increased correlations within the initial pattern of synaptic connectivity could also be amplified, leading to degraded learning (see METHODS and APPENDIX). The parameter dependence of these effects is shown in Fig. 14A. Increasing the initial level of correlation resulted in gradual loss of learning. Increasing learning rates led to highly variable results, with one simulation showing perfect learning, even when learning rates were increased by a factor of 10. Figure 14B shows spurious correlations in a different simulation in which learning rates were increased by a factor of 10 (Fig. 14A, arrow). While many assemblies are co-active with others in the same tutor syllable representation, other assemblies are "incorrectly" co-active with assemblies from different syllables (Fig. 14B, left panel). These incorrect associations can be seen in the pattern of intrinsic RA connections, where strong connections are concentrated in blocks along the diagonal, but scattered inappropriate connections are seen as well (Fig. 14B, right panel; cf. Fig. 10).
|
A second type of common error was caused by a misadjustment of competitive mechanisms in the network. The most important parameter contributing to competition between different representations was the LTP/LTD threshold (see METHODS). Competition could also be increased by increasing inhibition. We increased inhibition indirectly, by first scaling the total excitatory synaptic strength in RA which then triggered a homeostatic increase in inhibition. The parameter dependence of these effects is shown in Fig. 14C. Figure 14D shows the effects of increased competition due to increasing the LTP/LTD threshold in RA by a factor of 2 in RA (Fig. 14C, arrow). Because of the increased competition, syllable representations have been split into subsyllables, in which RA assemblies are strongly connected to and co-active with a subset of the assemblies encoding the same tutor syllable.
![]() |
DISCUSSION |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Principal findings and predictions
By constructing a computational model, we have provided a
theoretical demonstration that associational plasticity, guided by
template comparison signals transmitted by the AFP, can account for the
sensorimotor learning of birdsong syllables. The model incorporates a
wide range of experimental data related to song learning and addresses
the crucial problem of feedback delay during motor learning: the delay
for reafferent auditory signals returning to RA via the AFP is
estimated to be nearly as long as a typical song syllable. Our model
suggests that the bird solves this problem by generating an internal
prediction, or efference copy, of the expected auditory reafference
within the song nucleus HVc. This efference copy is then compared with
the stored template to guide song learning. Thus, we predict that
activity recorded in the AFP during singing (Hessler and Doupe
1999a,b
) is a motor signal encoding sensory information.
Experiments designed to test this prediction face two significant
challenges. First, the nature of both sensory and motor representations
within the song system is poorly understood. Direct tests will require
substantial progress in this area. Second, separating the sensory and
motor aspects of neural activity during sensorimotor behaviors is
notoriously difficult. For example, the auditory processing of a
bird's own feedback during singing, and the processing of the same
auditory signal when not singing, may be quite different. One approach
to this problem is to alter auditory feedback pathways and monitor
neural activity and/or motor output during singing. Our model predicts
that changes in song due to altering the auditory feedback pathway
should be indirect effects of perturbing the efference copy mapping,
which then causes errors in the motor output. Since we assume that
auditory feedback does not play an active role during vocal production,
we expect that removal of auditory feedback by deafening should result
in no immediate change in vocal output, consistent with data from many
avian species (Konishi 1965; Nottebohm
1968
; Price 1979
). Use of an efference copy is
also consistent with the slow degradation of song after complete
removal of auditory feedback by deafening (Nordeen and Nordeen
1992
) and the more rapid degradation seen after perturbation of
auditory feedback by consistent playback of auditory signals any time
the bird sings (Leonardo and Konishi 1999
). Altered
feedback is expected to result in an active and hence more rapid
alteration of the efference copy mapping, whereas removal of feedback
could allow a passive drift in the efference copy map, resulting in a
slow degradation of song. Note that these data are not naturally
accounted for by "error-based" learning hypotheses, since deafening
results in a large change in the sensory signal. By retaining the key
elements of the AFP comparison hypothesis, our model is consistent with
the finding that AFP lesions prevent the disruption of song due to
deafening (Brainard and Doupe 2000
).
In addition to being consistent with the behavioral data, our model
makes the specific prediction that a mismatch between actual and
expected auditory feedback should elicit a detectable change in the
song-related activity of HVc_AFP neurons. This change may also be
indirectly registered in AFP neurons (as well as HVc_RA neurons, see
Troyer and Doupe 2000). If the mismatch is sustained, increasingly significant changes in song-related neural activity should
be seen in HVc and the AFP over time. Since a change in AFP output is
required to alter the connectivity in RA in the model, these changes
should be recorded before significant changes are able to be
recorded in the patterns of RA motor activity or song output.
At the level of song circuitry, the model predicts that the
motor-to-sensory transformation necessary for an efference copy is
learned within the connections between the two populations of HVc
projection neurons. Consistent with earlier suggestions based on
physiological evidence, the model makes the further anatomical hypothesis that auditory afferents to HVc should preferentially (although not necessarily exclusively) synapse onto AFP-projecting neurons while premotor afferents should preferentially synapse onto
RA-projecting HVc neurons (Katz and Gurney 1981;
Kimpo and Doupe 1997
; Lewicki 1996
;
Saito and Maekawa 1993
).
In addition to the problem of feedback delay, our model solves a second
problem posed by the AFP comparison hypothesis: if the AFP were to
directly evaluate auditory feedback signals, these signals would have
to bypass strong, ongoing premotor activity within HVc (Fig. 1,
C and D; McCasland 1987;
McCasland and Konishi 1981
; Yu and Margoliash
1996
; but see Foster and Bottjer 1998
). By
proposing that the AFP evaluates an efference copy, our model circumvents this problem and suggests a functional reason for why the
AFP lies downstream of HVc: use of an efference copy requires that
template comparison take place downstream of the motor pattern generator. Because auditory inputs are necessary for efference copy
learning, however, our solution does not eliminate the problems raised
by the mixing of motor and sensory signals within HVc. We predict that
these signals are kept separate in HVc both by the greater strength of
the efference copy signal, and by the cancellation of auditory
reafference (McCasland and Konishi 1981
) by strong
adaptation mechanisms (synaptic or neuronal) within HVc_AFP. A slow
after-spike hyperpolarization recently found in AFP-projecting HVc
neurons (Dutar et al. 1998
) may contribute to this
cancellation. Because the cancellation depends on an accurate efference
copy, our model predicts that auditory signals recorded in HVc or the
AFP should be stronger in very young birds than in juveniles or adults.
Finally, we propose that circuits intrinsic to RA play an important
role in encoding the motor programs for individual song syllables (cf.
Spiro et al. 1999). This proposal is consistent with
anatomical data (Herrmann and Arnold 1991
) as well as
the hypothesis that the precision of RA activity (Yu and
Margoliash 1996
) emerges as a result of neural circuitry
intrinsic to RA, rather than being driven by (temporally less precise)
inputs from HVc.
Alternative models and mechanisms
Our model constitutes a sufficiency argument, i.e., the
model demonstrates that the proposed hypotheses are
sufficient to solve important problems related to song
learning. However, experimental data related to song learning are
simply too sparse to disallow a wide range of possibilities. In
particular, there are several alternatives to our proposal that
efference copy is used to mitigate the problem of feedback delay. For
example, it is possible that some song learning is done "off-line,"
i.e., aspects of motor activity and sensory reafference may be stored
in medium- or long-term memory and used to readjust the motor circuit
when the bird is not singing, perhaps during sleep (Dave et al.
1999). A more likely alternative is the use of short-term or
"working" memory mechanisms. At the synaptic level, "memory
traces" (Houk et al. 1995
) could "tag"
(Frey and Morris 1998
) a synaptic site, making it
receptive to delayed signals related to template comparison. At the
network level, activity related to the motor command could be
maintained within a feedback circuit and then compared with the
auditory feedback when it arrives. These proposals raise a number of
questions that have yet to be investigated. Most notable is the
difficulty of directing memory signals toward the appropriate
connections in a manner that is not disturbed by the presence of strong
ongoing motor activity. The problem of segregating motor and sensory
signals is also shared by song learning models in which HVc both
generates premotor commands and passes auditory feedback information on to the AFP. For example, Doya and Sejnowski (1998)
have
proposed a model that is similar to ours in adopting the AFP comparison hypothesis and using reinforcement learning to guide song learning in
RA (see also Fry 1996
). However, Doya and
Sejnowski (1998)
do not address the problems of feedback delay
and sensory/motor mixing in HVc that lie at the core of our model.
Weaknesses of the model
Our model has a number of weaknesses. The most important of these is our strong simplifying assumptions regarding the encoding of sensory and motor information related to song. These simplifications were chosen for two main reasons. First, extremely little is known about the manner in which song is encoded in the patterns of neural activity distributed across the various song nuclei. Second, our theoretical understanding of Hebbian learning rules is limited. In particular, the tendency for these rules to amplify "spurious" correlations has only been addressed in networks of limited complexity.
Another weakness of the model is that the auditory feedback problem has
only been partially addressed. Using our estimates, AFP comparison
signals will arrive in RA with a delay of roughly 40 ms. While this
does prevent significant overlap with the motor activity related to the
next syllable, 40 ms may still represent a significant delay given that
RA motor activity is time-locked to the motor output with a precision
of less than 5 ms (Yu and Margoliash 1996). One
possibility for addressing this problem would be to use internal
regularities in the motor program and temporally asymmetric learning
rules to anticipate the future state of the motor program.
Efference copies of this predicted motor command could be
processed in the AFP and arrive in RA time locked to the arrival of the
actual motor command from HVc. Overall, the model points to
the need for better data concerning the temporal relationships between
activity patterns in the various song nuclei during singing. Our
estimate of AFP processing time is based on variable auditory latencies
recorded in anesthetized birds, and hence, is only poorly constrained.
Better timing estimates obtained by microstimulation and/or correlation
analyses in singing birds could yield important information regarding
the functional interactions between various nuclei during song production.
Related to the issue of processing delays is the fact that temporal
aspects of song change with development. Generally, syllables are
longer and produced at a slower tempo in young birds (Immelmann 1969). Thus, one possible solution to the problem of feedback delay is that it may simply be a smaller problem in juveniles. However,
this solution depends on the motor commands for the slow juvenile
syllables being nearly identical to those for the more rapid syllables
sung in adulthood. Furthermore, even though syllables are longer,
neural processing may also be slower in young birds, thereby increasing
feedback delay.
Location of the template
Our model is built on the working assumption that the memorized
template is stored in the AFP. While the data pointing to the AFP as a
candidate site are suggestive (Basham et al. 1996; Bottjer et al. 1984
; Scharff and Nottebohm
1991
; Sohrabji et al. 1990
), direct
physiological tests have been equivocal (Doupe and Solis
1997
; Solis and Doupe 1997
, 1999
). Consequently,
we have begun to generalize our model to consider the possibility that template information is stored in auditory areas closer to the periphery (for candidate sites, see Bolhuis et al. 2000
;
Foster and Bottjer 1998
; Mello et al.
1998
; Vates et al. 1996
). Initial simulations
suggest that song learning can be guided by auditory feedback that
reaches the song system only after it has been filtered through neurons selective for the tutor song. This "template filter hypothesis" raises the possibility that sensorimotor learning may
involve the transfer of template information into the song system. The fact that this transfer would rely on the bird's own vocalization serving as a carrier signal is consistent with
experimental data showing that AFP neurons develop selective auditory
responses to both the bird's own song and the tutor song
(Doupe 1997
; Solis and Doupe 1997
, 1999
)
during the sensorimotor phase of song learning. Note that efference
copy may still play an important role, since a different template
storage site does little to alter the basic problem of feedback delay.
Further investigations are required to fully explore these possibilities.
Role of efference copy
We have used the term efference copy to refer to a motor signal
that has been converted to sensory coordinates. Our use of efference
copy is most similar to the notion of a forward model used for motor
learning and control (Jordan 1995; Miall and
Wolpert 1996
; Miall et al. 1993
): an internal
prediction that is compared with a target reference, in our case, the
tutor song. Our model differs from standard motor control models in
that the efference copy is primarily used to modulate plasticity rather
than to control ongoing vocalization. Moreover, template comparison in
our model does not result in an "error" or "mismatch" signal,
i.e., the difference between the tutor template and the bird's own
song is never computed. Instead, the model relies on "matching"
signals that could be easily computed by neurons receiving input from a
population of cells broadly tuned to the tutor song.
Our model also uses efference copy in its classic role as a negative
image used to "subtract off" sensory reafference (Bell et
al. 1997; Sperry 1950
; von Holst and
Mittelstaedt 1980
). However, the purpose of the cancellation is
to prevent interference with the ongoing motor program, not to
differentiate the sensory signals that are due to an animal's own
behavior from those caused by events in the external world.
Furthermore, this negative image is a secondary effect in our model,
resulting from adaptation mechanisms within HVc_AFP (Fig. 5).
Random motor behavior and innate templates
Our model uses reinforcement learning to refine initially random
activity into motor commands matched to an internal template. A major
drawback of reinforcement learning is the "curse of
dimensionality," i.e., if motor space contains too many degrees of
freedom, the chance of randomly activating an appropriate combination
of motor neurons is exceedingly low. In our model, RA premotor neurons have been preorganized into a small number (40) of motor assemblies. Thus, random activity in the model is actually confined to a relatively narrow range of possible vocal productions, allowing successful reinforcement-based learning after a relatively brief period of vocal
development. In this way, our model can be seen as relying on an
"innate template" (reviewed in Marler 1997) to
reduce the dimensionality of motor space.
Efference copy may be common in many systems
The close parallels between vocal learning in birds and humans
(Doupe and Kuhl 1999) suggest that efference copy may
also play a role during speech development. For example, speech is slowly degraded in humans deafened as adults (Cowie and
Douglas-Cowie 1992
; Waldstein 1989
), but can be
altered within an hour by systematic alterations of auditory feedback
(Houde and Jordan 1998
). Moreover, mismatches between
expected and received auditory feedback cause increased activation in
auditory language areas in temporal cortex (Hirano et al.
1997
; McGuire et al. 1996
). These results are
entirely consistent with the efference copy hypothesis: passive drift
after deafening and a more active alteration of the efference copy with altered feedback, perhaps via an association of motor commands with the
mismatch signal registered in temporal cortex.
At a general level, our model focuses on the interaction between
reciprocally connected populations of neurons, where one population has
been assigned a primarily motor and the other a primarily sensory role.
This dichotomy parallels traditional views of motor/sensory circuits
subserving language (Wernicke 1908) and within
frontal/parietal circuits underlying memory-guided reaching and saccade
behaviors (Chafee and Goldman-Rakic 1998
). Efference
copy learning may be a natural consequence of Hebbian learning within
such a circuit. Our model suggests that this learning is expected to
occur whenever 1) a projection exists from neurons displaying motor activity to neurons that receive sensory inputs, and
2) the time window for associative learning is roughly
matched to the sensory feedback delay. The simplicity of these
conditions argues that use of an efference copy may be a common
strategy for overcoming feedback delay in a wide variety of circuits
subserving sensorimotor learning.
![]() |
APPENDIX |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
This appendix contains details of the implementation of our
computational algorithm. We abbreviate HVc_RA as HR, HVc_AFP as HA, and
AFP as AF. rHR,
rRA,
rHA-E,
rHA-M,
rHA-L,
rHA-G, and
rAF denote firing rates, where
E, M, L, G refer to the
early, middle, late, and gap portions of HVc_AFP activity (see Fig. 7).
rEC denotes the HVc_AFP efference copy
activity passed to the AFP and is calculated as the average
HVc_AFP activity during the early (25-ms long) and middle (35-ms long)
portions of the current syllable: rEC = (25rHA-E + 35rHA-M)/(25 + 35). [HA,
HR], [RA, HR], and [RA, RA]
denote the three sets of excitatory synaptic connections that
undergo learning, where [post, pre] denotes a matrix of
synaptic strengths. For example, [HA,
HR]ij is the connection strength from the jth HVc_RA assembly to the ith HVc_AFP assembly.
We use |x+| = max (x,
0) to denote rectification, and x
= (1/N)
i=1N
xi to denote averaging. Values of most
parameters are expressed in arbitrary units calibrated so that the
homeostatically controlled average firing rate, and the input/output
gain (output = gain × |net input
threshold|+), are equal to 1. For HVc_AFP,
average firing rate was the weighted average of firing over all four
portions of the syllable.
The calculation of neural activity was based on the following rule:
ri = |input adaptation
inhibition
|+, where
= 1 is
spike threshold. Only HVc_AFP includes adaptation. Inhibition is
calculated as GiI, where
I represents the activity in a single local inhibitory
assembly, and Gi is the inhibitory strength onto excitatory assembly i. In HVc_RA, HVc_AFP, and
the AFP, inhibition is "feedforward": inhibitory activity
I is set equal to the average afferent (feedforward) input
received by the population, minus a threshold, i.e., I = |
aff
I|+.
i was set to 20% of the target level of
input for these populations (see step 8a below):
IHR =
IHA = 4, and
IAF = 3. RA included feedback
dynamics (described below, step 3). RA inhibition is also
"feedback," i.e., I is set equal to the mean level of
activity within RA, minus a threshold: I = |
rRA
IRA|+
with
IRA = 0.20.
Simulations
For each syllable, n, we performed the following nine steps. These were repeated for a fixed number of syllables (usually 25,000).
1. Generate premotor drive,
pi(n), onto each HVc_RA
assembly, i. First we generate random variables
i = |
|+, where
was generated from a
Gaussian distribution with mean equal to 3 and variance equal to 1. pi(n) = pdrive
i/
.
pdrive = 20 is a constant that
determines the magnitude of the drive.
2. Calculate HVc_RA activity. The input afferent to HVc_RA
is equal to the premotor drive:
affiHR(n) = pi(n). Output firing rates are
determined from
riHR(n) = |affiHR(n) GiHR
|+, I = |
affHR(n)
IHR|+.
3. Calculate RA activity. The afferent input is calculated
as
affiRA(n) = j [RA,
HR]ijrjHR(n).
To calculate the output firing rates, the following dynamics were
simulated
![]() |
![]() |
![]() |
![]() |
4. Calculate HVc_AFP activity. Separate calculations were
made for the four syllable subdivisions (E, M,
L, G; Fig. 7)
![]() |
![]() |
![]() |
![]() |
Output firing rates are determined from
riHA(n) = |affiHA(n) ai
GiHAI
|+, I = |
affHA(n)
IHA|+. The
level of adaptation, ai, was updated
after calculating activity for each of the syllable subdivisions (see
step 8c).
5. Calculate AFP activity.
affkAF(n) = j
Tkj
(n),
where T is the connection matrix from HVc_AFP to the AFP
that encodes tutor syllables (Fig. 6B).
Tkj = 1.875 if assembly j
belongs to tutor syllable k;
Tkj = 0 otherwise. A sublinear (square
root) function is included so that a better match is obtained when
efference copy activity is distributed equally among the assemblies
encoding a given tutor syllable, rather than the having strong activity
within just a few assemblies. Output firing rates were calculated as
rkAF(n) = |affkAF(n)
GkAFI
|+, I = |
affAF(n)
IAF|+.
6. Calculate reinforcement.
Rksyl(n) = |sylrkAF(n)
k|+ is the contribution to
the reinforcement from the match to the kth tutor
syllable, where
k represents a threshold that is adjusted homeostatically (see below, 7c).
syl = 5 is a constant that
determines how large
k must be to keep
Rksyl controlled. Large
syl requires a large value for
k and hence yields significant reinforcement
for only the best matches. The total reinforcement signal
R(n) = cR(0.15 + 0.85Rsyl(n)).
cR = 20 determines the overall magnitude of
the reinforcement signal. Note that 15% of the reinforcement signal is
independent of the template match. This is included to be consistent
with our model of sequence learning (see Troyer and Doupe
2000
).
7. Update synaptic strengths. Synaptic plasticity was
based on the following rule
![]() |
![]() |
To implement our plasticity rule, we divided time into intervals of
constant pre- and postsynaptic activity
([tpost,
tpost + post] and
[tpre,
tpre +
pre])
that were either 1) completely overlapping
(tpre = tpost,
pre =
post), or 2) completely
nonoverlapping (tpre +
pre < tpost). We then rewrote our rule as
![]() |
To reduce spurious correlations, we included a "momentum" term in
our update rule (e.g., Rumelhart et al. 1986). The total synaptic change at each time step,
[post,
pre]ij(n), was computed by
taking the running average of past associations
![]() |
![]() |
8. Update and apply homeostatic mechanisms.
8a. Normalize synaptic strengths. First, we normalized total
synaptic strength for each presynaptic neuron
![]() |
![]() |
8b. Update inhibitory strengths. These are designed to keep
average activities, i (see step 9),
at the target value rtarg = 1.
Gipost = kinh(
ipost
rtarg), where
kHR = 1 × 10
4, kinh = 2 × 10
5. A similar algorithm was
used to determine reinforcement thresholds:
k = (2.5 × 10
4)(
ksyl
1). Inhibitory changes were also smoothed using momentum (see step
7), but
I = 1/100, making inhibitory change
faster than for excitation and thus avoiding feedback oscillations.
8c. Update adaptation. Only HVc_AFP included adaptation. The
level of adaptation for assembly i,
ai(t), was updated at the end
of each epoch (early, middle, late, and gap; Fig. 7). Adaptation increase is proportional to activity; adaptation decrease results from
exponential decay. Assuming that an assembly had activity r
for a time period of length starting at time t,
ai(t +
) =
hri + e
/
decayai(t).
decayHA = 115 ms and
hHA = 0.043 ms
1. Assuming a constant activity level of 1, adaptation would have a strength of five input units, 25% of the total
input during periods when HVc_AFP is receiving both efference copy and
auditory input.
9. Calculate running averages of activity. The running
average of activity was calculated using
(n) = (1
r)
(n
1) +
rr(n). For all activity
variables,
r = 1/10. Since reinforcement
matches were more variable, slower averaging
(
Rsyl = 1/100) was used for
ksyl.
Initializing variables
Initial excitatory synaptic strengths were set as described in METHODS. To equilibrate homeostatically adjusted variables, 500 syllables were simulated in which no associational learning took place. When reporting our results, these syllables were not included, i.e., syllable number 1 starts after this period.
Simulations with altered parameters
In arriving at our results, many simulations were run in which
parameters were varied in a nonsystematic manner (results not reported). To more systematically explore the range of model
behavior, simulations were run when various parameters were increased
by a constant factor c. Figure 14A (circles)
shows increased excitatory and inhibitory learning rates,
kpost,pre c × kpost,pre and
kinh
c × kinh. Figure 14A (plus signs)
shows increased correlation in initial connections, Gaussian noise
added when setting initial synaptic strengths
c × 10% of strength of nonzero synapses (see METHODS). Figure
14C (circles) shows increased LTP threshold in RA,
bRA
c × bRA. Figure 14A (plus
signs) shows increased synaptic strengths in RA, [RA,
RA]
c × [RA,
RA] and [RA, HR]
c × [RA, HR].
![]() |
ACKNOWLEDGMENTS |
---|
We thank B. Baird, D. Buonomano, C. Linster, A. Krukowski, K. Miller, and members of the Doupe lab for many helpful comments. Special thanks to K. Miller for input and support throughout the project.
This work was supported by the McDonnell-Pew Program in Cognitive Neuroscience (T. W. Troyer), and National Institutes of Health Grants MH-12372 (T. W. Troyer) and MH-55987 and NS-34835 (A. J. Doupe).
![]() |
FOOTNOTES |
---|
Present address and address for reprint requests: T. Troyer, Dept. of Psychology, University of Maryland, College Park, MD 20742 (E-mail: ttroyer{at}psyc.umd.edu).
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Received 16 February 2000; accepted in final form 2 May 2000.
![]() |
REFERENCES |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|