Centre for Brain and Cognitive Development, Birkbeck College, London, UK
M. W. Spratling, Centre for Brain and Cognitive Development, Birkbeck College, 32 Torrington Square, London WC1E 7JL, UK. Email: m.spratling{at}bbk.ac.uk.
![]() |
Abstract |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
Introduction |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
This observation has formed the basis for many theories of receptive field formation, and is an essential feature of many computational (neural network) models of cortical function (von der Malsburg, 1973; Rumelhart and Zipser, 1985
; Grossberg, 1987
; Földiák, 1989
, 1990
, 1991
; Oja, 1989
; Sanger, 1989
; Hertz et al., 1991
; Ritter et al., 1992
; Sirosh and Miikkulainen, 1994
; Marshall, 1995
; Swindale, 1996
; Wallis, 1996
; Kohonen, 1997
; O'Reilly, 1998
). Such neural network algorithms have also found application beyond the neurosciences as a means of data analysis, classification and visualization in a huge variety of fields. These algorithms vary greatly in the details of their implementation. In some, competition is achieved explicitly by using lateral connections between the nodes of the network (von der Malsburg, 1973
; Földiák, 1989
, 1990
; Oja, 1989
; Sanger, 1989
; Sirosh and Miikkulainen, 1994
; Marshall, 1995
; Swindale, 1996
; O'Reilly, 1998
), while in others competition is implemented implicitly through a selection process which chooses the winning node(s) (Rumelhart and Zipser, 1985
; Grossberg, 1987
; Földiák, 1991
; Hertz et al., 1991
; Ritter et al., 1992
; Wallis, 1996
; Kohonen, 1997
). However, in all of these algorithms nodes compete for the right to generate a response to the current pattern of input activity. A node's success in this competition is dependent on the total strength of the stimulation it receives and nodes which compete unsuccessfully have their output activity suppressed. This class of models can thus be described as implementing post-integration inhibition.
Inhibitory contacts also occur on the dendrites of cortical pyramidal cells (Kim et al., 1995; Rockland, 1998
) and certain classes of interneuron (e.g. double bouquet cells) specifically target dendritic spines and shafts (Tamas et al., 1997
; Mountcastle, 1998
). Such contacts would have relatively little impact on excitatory inputs more proximal to the cell body or on the action of synapses on other branches of the dendritic tree. Thus these synapses do not appear to contribute to post-integration inhibition. However, such synapses are likely to have strong inhibitory effects on inputs within the same dendritic branch that are more distal to the site of inhibition (Rall, 1964
; koch et al., 1983
; Segev, 1995
; Borg-Graham et al., 1998
; Kock and Segev, 2000
). Hence, they could potentially selectively inhibit specific groups of excitatory inputs. Related synapses cluster together within the dendritic tree so that local operations are performed by multiple, functionally distinct, dendritic subunits before integration at the soma (Mel, 1994
, 1999
; Segev, 1995
; Segev and Rall, 1998
; Häusser et al., 2000
; Kock and Segev, 2000
; Häusser, 2001). Dendritic inhibition could thus act to block the output from individual functional compartments. It has long been recognized that a dendrite composed of multiple subunits would provide a significant enhancement to the computational powers of an individual neuron (Mel, 1993
, 1994
, 1999
) and that dendritic inhibition could contribute to this enhancement (Koch et al., 1983
; Segev and Rall, 1998
; Kock and Segev, 2000
). However, the role of dendritic inhibition in competition between cells and its subsequent effect on neural coding and receptive field properties has not previously been investigated.
We introduce a neural network model which demonstrates that competition via dendritic inhibition significantly enhances the computational properties of networks of neurons. As with models of post-integration inhibition we simplify reality by combining the action of inhibitory interneurons into direct inhibitory connections between nodes. Furthermore, we group all the synapses contributing to a dendritic compartment together as a single input. Dendritic inhibition is then modeled as (linear) inhibition of this input. The algorithm is described fully in the Methods section, but essentially it operates by causing each node to attempt to block its preferred inputs from activating other nodes. It is thus described as pre-integration inhibition.
We illustrate the advantages of this form of competition with the aid of a few simple tasks that have been used previously to demonstrate the pattern recognition abilities required by models of the human perceptual system (Nigrin, 1993; Marshall, 1995
; Marshall and Gupta, 1998
). Although these tasks appear to be trivial, succeeding in all of them is beyond the abilities of single-layer neural networks using post-integration inhibition. These tasks demonstrate that pre-integration inhibition (in contrast to post-integration inhibition) enables a neural network to respond simultaneously to multiple stimuli, to distinguish overlapping stimuli, and to deal correctly with incomplete and ambiguous stimuli.
![]() |
Methods |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
![]() |
|
|
![]() |
For the simulation shown in Figure 5 a bias was added to the activation of one node. This was implemented by adding 0.1 to the activation of that node during competition. Experiments showed that this bias could occur at any time (and for any duration) prior to
reaching a value of 1.5 to generate the same result. Although results have not been shown here this method is not restricted to working with binary encodings of input patterns and works equally well with analog encodings.
|
![]() |
Results |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
In many situations distinct sensory events will share many features in common. If such situations are to be distinguished it is necessary for different sets of neurons to respond despite this overlap in input features. As a simple example, consider the task of representing two overlapping patterns: ab and abc. A network consisting of two nodes receiving input from three sources (labeled a, b and c) should be sufficient. However, because these input patterns overlap, when the pattern ab is presented the node representing abc will be partially activated, while when the pattern abc is presented the node representing ab will be fully activated.
When the synaptic weights have certain values both nodes will respond with equal strength to the same pattern. For example, when the weights are all equal, both nodes will respond to pattern ab with equal strength (Marshall, 1995). Similarly, when the total synaptic weight from each input is normalized (post-synaptic normalization) both nodes will respond equally to pattern ab (Marshall, 1995
). When the total synaptic weight to each node is normalized (pre-synaptic normalization) both nodes will respond to pattern abc with equal activation (Marshall, 1995
). Under all these conditions the response fails to distinguish between distinct input patterns and post-integration inhibition can do nothing to resolve the situation (and will, in general, result in a node chosen at random winning the competition).
Several solutions to this problem have been suggested. Some require adjusting the activations using a function of the total synaptic weight received by the node [i.e. using the Webber Law (Marshall, 1995) or a masking field (Cohen and Grossberg, 1987
; Marshall, 1995
)]. These solutions scale badly with the number of overlapping inputs, and do not work when (as is common practice in many neural network models) the total synaptic weight to each node is normalized. Other suggestions have involved tailoring the lateral weights to ensure the correct node wins the competition (Földiák, 1990
; Marshall, 1995
). These methods work well (Marshall, 1995
), but fail to meet other criteria as discussed below.
The most obvious, but most overlooked, solution would be to remove constraints placed on allowable values for synaptic weights (e.g. normalization) which serve to prevent the input patterns being distinguished in weight space. It is simple to invent sets of weights which unambiguously classify the two overlapping patterns (e.g. if both weights to the node representing ab are 0.5 and each weight to the node representing abc are 0.4 then each node responds most strongly to its preferred pattern and could then successfully inhibit the activation of the other node).
Using pre-integration lateral inhibition, overlapping patterns can be successfully distinguished even when normalization is used (either pre- or post-synaptic normalization). Figure 2 shows the response of such a network to all possible input patterns. The two networks on the right show that the correct response is generated to input patterns ab and abc. The other networks show that when partial input patterns are presented the node that represents the most similar pattern is activated in proportion to the degree of overlap between the partial pattern and the preferred input of that node. Hence, when the input is a or b, which partially matches both of the training patterns, then the node representing the smallest pattern responds since these partial patterns are more similar to ab than to abc. When the input is c this partially matches only one of the training patterns and hence the node representing abc responds. Similarly, patterns bc and ac most strongly resemble abc and hence cause activation of that node.
|
While it is sufficient in certain circumstances for a single node to represent the input (local coding) it is desirable in many other situations to have multiple nodes providing a factorial or distributed representation. As an extremely simple example consider three inputs (a, b and c), each of which is represented by one of three nodes. Any pattern of inputs can be represented by having zero, one or multiple nodes active. In this particular case the input to the network provides just as good a representation as the output so there is little to be gained. However, this example captures the essence of other, more realistic, tasks in which multiple nodes, each of which represent multiple inputs, may need to be active.
Post-integration lateral inhibition can be modified to enable multiple nodes to be active (Földiák, 1990; Marshall, 1995
) by weakening the strength of the competition between those pairs of nodes that require to be co-active (the lateral weights need to reach a compromise strength which provides sufficient competition for distinct patterns while allowing multiple nodes to respond to multiple patterns). This either requires a priori knowledge of which nodes will be co-active or the ability to learn appropriate lateral weights. However, information locally available at a synapse is insufficient to determine if the correct compromise weights have been reached (Spratling, 1999
) and it is thus necessary to add further constraints to derive a learning rule. The proposed constraints require that all input patterns occur with equal probability and that pairs of nodes are co-active with equal frequency (Földiák, 1990
; Marshall, 1995
). These constraints severely restrict the class of problems that can be successfully represented to those in which all input patterns are mutually exclusive or in which all pairs of input patterns occur simultaneously with equal frequency. As an example of a case for which these networks would fail, consider using a single network to represent the color and shape of an object. At any given time only one node (or group of nodes) representing a single color and one node (or group of nodes) representing a single shape should be active. There thus needs to be strong inhibition between nodes representing properties within the same class, and weak inhibition between nodes representing different properties. This task fails to match the requirements implicitly defined in the learning rules, and application of those rules would lead to weakening of lateral inhibition within each class until multiple color nodes and multiple shape nodes were co-active with equal frequency. Hence, post-integration lateral inhibition, implemented using explicit lateral weights, fails to provide factorial coding except for the exceptional case in which all pairs of patterns co-occur together, or in which external knowledge is available to set appropriate lateral weights.
Networks in which competition is implemented using a selection mechanism can also be modified to allow multiple nodes to be simultaneously active (e.g. k-winners-takes-all). However, these networks also place restrictions on the types of task that can be successfully represented to those in which a pre-defined number of nodes need to be active in response to every pattern of stimuli.
In contrast, pre-integration lateral inhibition places no restrictions on the number of active nodes, nor on the frequency with which nodes, or pairs of nodes, are active. Such an network can thus respond appropriately to any combination of input patterns; for example, it can directly solve the problem of representing any arbitrary combination of the inputs a, b and c. A more challenging problem is shown in Figure 3. Here nodes represent six overlapping patterns. The network responds correctly to each of these patterns and to multiple, overlapping, patterns (even in cases where only partial patterns are presented).
|
In some circumstances there simply is no correct parsing of the input pattern. Consider a neural network with two nodes and three inputs (a, b and c). If one node represents the pattern ab and the other represents the pattern bc then the input b is ambiguous since it equally matches the preferred input of both nodes. In this situation, most implementations of post-synaptic lateral inhibition would allow one node, chosen at random, to be active at half its normal strength. An alternative implementation (Marshall, 1995) is to use weaker lateral weights to enable both nodes to respond with one-quarter of the maximum response (Marshall and Gupta, 1998
). However, this approach is also unsatisfactory since it suggests that one-quarter of each pattern is present, when this is not the case. Neither of these activity patterns seem to provide an appropriate representation. Any response in which both nodes generate equal activity suggests that a single piece of data provides evidence for two interpretations simultaneously. While any response in which one node has higher activity than the other is making an unjustified, arbitrary, selection. Pre-integration lateral inhibition avoids generating responses that are not justified by the available data by preventing any response (Fig. 4
). It thus produces no representation of the input rather than a potentially misleading representation.
|
![]() |
Discussion |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Computational considerations have led us to suggest that competition via dendritic inhibition could significantly enhance the information-processing capacities of networks of cortical neurons. This claim is anatomically plausible since it has been shown that cortical pyramidal cells innervate inhibitory cell types, which in turn form synapses on the dendrites of pyramidal cells (Buhl et al., 1997; Tamas et al., 1997
). However, determining the functional role of these connections will require further experimental evidence. Our model predicts that it should be possible to find pairs of cortical pyramidal cells for which action potentials generated by one cell induce inhibitory post-synaptic potentials within the dendrites of the other. Independent of such experimental support, the algorithm we have presented could have immediate advantages for a great number of neural network applications in a huge variety of fields.
![]() |
Notes |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
![]() |
References |
---|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---|
Buhl EH, Tamas G, Szilagyi T, Stricker C, Paulsen O, Somogyi P (1997) Effect, number and location of synapses made by single pyramidal cells onto aspiny interneurones of cat visual cortex. J Physiol 500: 689713.[Abstract]
Cohen MA, Grossberg S (1987) Masking fields: a massively parallel neural architecture for learning, recognizing, and predicting multiple groupings of patterned data. Appl Optics 26:18661891.[ISI]
Földiák P (1989) Adaptive network for optimal linear feature extraction. In: Proceedings of the IEEE/INNS International Joint Conference on Neural Networks, Vol. 1, pp. 401405. New York: IEEE Press.
Földiák P (1990) Forming sparse representations by local anti-Hebbian learning. Biol Cybern 64:165170.[ISI][Medline]
Földiák P (1991) Learning invariance from transformation sequences. Neural Comput 3:194200.
Gray CM (1999) The temporal correlation hypothesis of visual feature integration: still alive and well. Neuron 24:3147.[ISI][Medline]
Grossberg S (1987) Competitive learning: from interactive activation to adaptive resonance. Cogn Sci 11:2363.[ISI]
Häusser M (2001) Synaptic function: dendritic democracy. Curr Biol 11:R10R12.[ISI][Medline]
Häusser M, Spruston N, Stuart GJ (2000) Diversity and dynamics of dendritic signalling. Science 290:739744.
Hertz J, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation. Redwood City, California: Addison-Wesley.
Kim HG, Beierlein M, Connors BW (1995) Inhibitory control of excitable dendrites in neocortex. J Neurophysiol 74:18101814.
Koch C, Poggio T, Torre V (1983) Nonlinear interactions in a dendritic tree: localization, timing, and role in information processing. Proc Natl Acad Sci USA 80:27992802.[Abstract]
Kock K, Segev I (2000) The role of single neurons in information processing. Nature Neurosci Suppl 3:11711177.
Kohonen T (1997) Self-organizing maps. Berlin: Springer.
Marshall JA (1995) Adaptive perceptual pattern recognition by self-organizing neural networks: context, uncertainty, multiplicity, and scale. Neural Netw 8:335362.[ISI]
Marshall JA, Gupta VS (1998) Generalization and exclusive allocation of credit in unsupervised category learning. Netw Comput Neural Syst 9:279302.[ISI]
Mel BW (1993) Synaptic integration in an excitable dendritic tree. J Neurophysiol 70:10861101.
Mel BW (1994) Information processing in dendritic trees. Neural Comput 6:10311085.[ISI]
Mel BW (1999) Why have dendrites? A computational perspective. In: Dendrites (Stuart G, Spruston N, Häusser M, eds), pp. 271289. Oxford: Oxford University Press.
Mountcastle VB (1998) Perceptual neuroscience: the cerebral cortex. Cambridge, Massachusetts: Harvard University Press.
Nigrin A (1993) Neural networks for pattern recognition. Cambridge, Massachusetts: MIT Press.
Oja E (1989) Neural networks, principle components, and subspaces. Int J Neural Syst 1:6168.
O'Reilly RC (1998) Six principles for biologically based computational models of cortical cognition. Trends Cogn Sci 2:455462.[ISI]
Rall W (1964) Theoretical significance of dendritic trees for neuronal inputoutput relations. In: Neural theory and modeling (Reiss RF, ed.), pp. 7397. Stanford, California: Stanford University Press.
Reynolds JH, Desimone R (1999) The role of neural mechanisms of attention in solving the binding problem. Neuron 24:1929.[ISI][Medline]
Ritter H, Martinetz T, Schulten K (1992) Neural computation and self-organizing maps. An introduction. Reading, Massachusetts: Addison-Wesley.
Rockland KS (1998) Complex microstructures of sensory cortical connections. Curr Opin Neurobiol 8:545551.[ISI][Medline]
Roelfsema PR, Lamme VAF, Spekreijse H (2000) The implementation of visual routines. Vision Res 40:13851411.[ISI][Medline]
Rumelhart DE, Zipser D (1985) Feature discovery by competitive learning. Cogn Sci 9:75112.[ISI]
Sanger TD (1989) Optimal unsupervised learning in a single-layer linear feedforward neural network. Neural Netw 2:459473.[ISI]
Segev I (1995) Dendritic processing. In: The handbook of brain theory and neural networks (Arbib MA, ed.), pp. 282289. Cambridge, Massachusetts: MIT Press.
Segev I, Rall W (1998) Excitable dendrites and spines: earlier theoretical insights elucidate recent direct observations. Trends Neurosci 21:453460.[ISI][Medline]
Singer W (1999) Neuronal synchrony: a versatile code for the definition of relations? Neuron 24:4965.[ISI][Medline]
Sirosh J, Miikkulainen R (1994) Cooperative self-organization of afferent and lateral connections in cortical maps. Biol Cybern 71:6678.
Somogyi P, Martin KAC (1985) Cortical circuitry underlying inhibitory processes in cat area 17. In: Models of the visual cortex (Rose D, Dobson VG, eds), chapter 54, Chichester, UK: Wiley.
Spratling MW (1999) Artificial ontogenesis: a connectionist model of development. PhD thesis, Department of Artificial Intelligence, University of Edinburgh.
Swindale NV (1996) The development of topography in the visual cortex: a review of models. Netw Comput Neural Syst 7:161247.[ISI]
Tamas G, Buhl EH, Somogyi P (1997) Fast IPSPs elicited via multiple synaptic release sites by different types of GABAergic neurone in the cat visual cortex. J Physiol 500:715738.[Abstract]
Thorpe SJ (1995) Localized versus distributed representations. In: The handbook of brain theory and neural networks (Arbib MA, ed.), pp. 549552. Cambridge, Massachusetts: MIT Press.
von der Malsburg C (1973) Self-organisation of orientation sensitive cells in the striate cortex. Kybernetik 14:85100.[ISI][Medline]
von der Malsburg C (1981) The correlation theory of brain function. Technical Report 81-2, Max Planck Institute for Biophysical Chemistry.
Wallis G (1996) Using spatio-temporal correlations to learn invariant object recognition. Neural Netw 9:15131519.[ISI][Medline]