Today’s post tries to fit the theoretical concept of Predictive Coding with the unusual structure and connectivity of Pyramidal cells in the Neocortex.
Pyramidal neurons are interesting because they are one of the most common neuron types in the computational layers of the neocortex. This almost certainly means they are critical to many of the key cortical functions, such as forming representations of knowledge and reasoning about the world.
Anatomy of a Pyramidal Neuron
Pyramidal neurons are so-called because they tend to have a triangular body (soma). But this isn’t the most interesting feature! While all neurons have dendrites (inputs) and at least one axon (output), Pyramidal cells have more than one type of input – Basal and Apical dendrites.
Pyramidal neurons tend to have a single, long Apical dendrite that extends with few forks a long way from the body of the neuron. When it reaches layer 1 of the cortex (which contains mostly top-down feedback from cortical areas that are believed to represent more abstract concepts), the apical dendrite branches out. This suggests the apical dendrite likes to receive feedback input. If feedback represents more abstract, longer-term context, then this data would be useful for predicting bottom-up input. More on this later.
Pyramidal cells tend to have a few Basal dendrites that branch almost immediately, in the vicinity of the cell body. Note that this means the input provided to basal and apical dendrites is physically separated. We know from analysis of cortical microcircuits that axons terminating around the body of pyramidal cells in cortex layers 2 and 3 contain bottom-up data that is propagating in a feed-forward direction – i.e. information about the external state of the world.
Pyramidal cells have a single Axonal output that may fork, and may travel a very long distance to its targets including other areas of the cortex.
Predictive Coding (PC) is a method of transforming data from its original form, to a representation in terms of prediction errors. There’s not much interest in PC In the Machine Learning community, but in Neuroscience there is substantial evidence that the Cortex encodes information in this way. Similar but unrelated concepts have also been used for efficient compression of data in signal processing. The benefit of this transformation is due to compression: We assume that only prediction errors are important, because by definition, everything else can be predicted and is therefore sufficiently described elsewhere.
There are several research groups looking at computational models of Predictive Coding – in particular those of Karl Friston and Andy Clark.
Two uses for feedback
Assuming feedback contains a more processed and abstract representation of a broader set of data, it has two uses.
- Prediction for a more efficient representation of the world (e.g. Predictive Coding)
- Prediction for more robust interpretation (via integration of top-down information in perception)
Predictive coding aims to transform the representation inside the cortex to a more efficient one that encodes only the relationships between prediction errors. Take some time to decide for yourself whether this loses anything…!
But there are many perceptual phenomena that show how internal state affects perception and interpretation of external input. For example, the phenomenon of multistable perception in some visual illusions: We need to know what we’re looking for before we can see it, and we can deliberately change from one interpretation to another (see figure).
Now consider Bayesian inference, such as Belief Propagation, or Markov Random Fields – in all cases we combine a Prior (e.g. top-down feedback) with a Likelihood produced from current, bottom-up data. Good inference depends on effective integration of both inputs.
Ideally we would be able to resolve how both the modelling and inference benefits could be realized in the pyramidal cell, and how physical segregation of apical & basal dendrites might help this happen.
False-Negative Error Coding
The simplest scheme for predictive coding is simply to propagate only false-negative errors – where something was observed, but it was not predicted in advance. In this encoding, if the event was predicted, simply suppress any output. (Note: This assumes that another mechanism limits the number of false-positive errors – for example a homeostatic system to limit the total number of predictions.)
When a neuron fires, it represents a set of coincident input on a number of synapses. A pattern of input was observed. If the neuron was in a “predicted” state, immediately prior to firing, then we could safely suppress the output and achieve a simple predictive coding scheme. If a neuron is not in a predicted state when it fires, then the output should be propagated as normal.
False-Negative Error Coding in Pyramidal Cells
Since Pyramidal cells have 2 distinct inputs – basal and apical dendrites – we can implement the false negative coding as follows:
- Basal dendrites recognize patterns of bottom-up input; the neuron “represents” those patterns by generating a spike on its axonal output when stimulated by the basal dendrites.
- Apical dendrite learns to detect input that allows the cell’s spiking to be predicted. The apical dendrite determines the “predicted” state of the cell. Top-down feedback input is used for this purpose.
- If the cell is “predicted” when the basal dendrite tries to generate an output, then suppress that output.
- The cell internally self-regulates to ensure that it is rarely in a predicted state, and typically only at the right times.
- Physical segregation of the two dendrite types ensures that they can target feedback data for prediction and feed-forward data for classification.
Spike bursts (spike trains)
When Pyramidal cells fire, they usually don’t fire just once. They tend to generate a short sequence of spikes known as a “burst” or “train”. So it’s possible that False-Negative coding doesn’t completely eliminate the spike, but rather truncates the sequence of spikes to make the output far less significant and less likely to significantly drive activity in other cells. There may also be some benefit to being able to broadcast the event in a subtle way, perhaps as a form of timing signal.
So to evidence this theory, we could look for truncated or absent spike trains in presence of predictive input to the apical dendrite. Specifically, to observe that input causing a spike in the apical dendrite truncates or eliminates an expected spike train resulting from basal stimulation.
Is there any direct neurological evidence for different integration of spikes from Apical and Basal dendrites in Pyramidal cells? It turns out, yes, there is! Metz, Spruston and Martina  say: “… our data present evidence for a dendritic segregation of Kv1-like channels in CA1 pyramidal neurons and identify a novel action for these channels, showing that they inhibit action potential bursting by restricting the size of the [afterdepolarization]”.
Now for the AI/ML audience it’s necessary to translate this a bit. An “action potential” “occurs when the membrane potential (voltage) of a specific axon location rapidly rises and falls. Action potentials in neurons are also known as “nerve impulses” or “spikes” So bursting is the generation of a short sequence of rapid spikes.
So in other words, apical stimulation inhibits bursts of axonal output spikes from a pyramidal neuron. There’s our smoking gun!
According to this paper, the Apical dendrite uniquely inhibits the spike burst from soma (the basal dendrites don’t). This matches the behaviour we would expect, if pyramidal cells implement false-negative predictive coding via the different inputs to different dendrite types: If the apical dendrite fires, there’s no axonal burst. If there wasn’t a spike in the apical dendrite, but basal activity drives the cell over its threshold, then the cell output does burst.
Note there are many other papers with similar claims; we found that search terms such as “differential basal apical dendrite integration” to be helpful.
 “Dendritic D-type potassium currents inhibit the spike afterdepolarization in rat hippocampal CA1 pyramidal neurons”
Alexia E. Metz, Nelson Spruston and Marco Martina. J. Physiol. 581.1 pp 175–187 (2007)
We’ve seen how we might combine the observed phenomena of multistable perception via separation of feedback and feed-forward input to the basal and apical dendrites, and predictive coding, via a simple model of pyramidal cell function by false-negative error coding.
Unlike existing models of predictive coding within the cortex, which often posit separate populations of cells representing predictions and residual errors (e.g. Rao and Ballard, 1999), we have proposed that coding could occur within the known biology of individual pyramidal cells, due to the different integration of apical and basal dendrite activity. At the same time, the proposed method allows feedback and feedforward information to be integrated within the same mechanism.
Over the next few months we’ll be testing some of these ideas in simulation!
I think I read the same explanation on Fergal Byrnes blog (which seems to be down now).
How are you going to implement the “burst”? As a signal over x time steps or a weighted activation? And are you going to implement the truncation as separate from the lateral inhibition? Do you think prediction is exclusively top-down or lateral as well, like in HTM?
We are moving towards a Spiking-Neural-Network implementation because at the moment we believe that some of the good temporal coding properties cannot be effectively captured in a weighted activation. Look out for the next blog post on Spike-Timing Dependent Plasticity (STDP)!
I don’t think I’ve seen the same explanation but I’m aware Fergal has chewed over many of the same ideas. If you have a link that’d be good – he may well have different biological evidence that I’d like to see.
A lot of questions to answer! I think we will probably try something like a burst. Unsure exactly how. We have several major experiment algorithms under test. We like top-down and lateral prediction but will test them independently as well as together.
I can’t give you a link to a specific blog post, I’d have to go through all his post via the wayback machine to find it. But I think the idea is also expressed in his online book, though in a somewhat confused way: “This last process is called bursting, and gives rise to a short-lived pattern which encodes exactly how well the column as an ensemble has matched its predictions. Basically, the more cells which fire, the more “confused” the match between prediction and reality.”
That sounds like predictive coding, but the other details he mentions don’t quite make sense to me in that regard.
It’s possible to come to similar ideas from different directions. I believe HTM has a simple feature where more cells “fire” if the output is unpredicted. This is just arriving at a similar conclusion from a different direction!
I think the difference is that they also model the “inhibitory sheath” which has its own kind of input and gives rise to the phenomenon of predictive coding (though the way that’s described in Byrne’s book doesn’t quite make sense to me). Byrne writes that the inhibitory sheath gets the same kind of bottom-up input as the rest of the column and only cells in a predictive state can beat it. But in that case predicted inputs should lead to bursts. To me it seems the inhibitory sheath should get the lateral input to be in a predictive state and then be triggered by a pyramidal cell firing. In that case predicted input would get one activation, unpredicted input would get a burst. But of course that doesn’t fit the fact that it’s the pyramidal cells that have the prediction mechanism. Ok, maybe the prediction mechanism of the pyramidal cells is only to beat other columns to the punch. So, if the inhibitory sheath gets lateral input, everything makes sense to me.
One thing I’ve learnt in neuroscience is that you can’t trust any one source. Only simulation/experimentation and repeated confirmation by multiple sources can really say definitively how it is really working. Nature likes to give everything 5 separate roles!
True. It is also extremely difficult to predict which functionality just emerges out of the mechanisms you already have an understanding of and which need an additional mechanism.
Anyway, after that long hiatus I’m really looking forward to more AGI-blog posts! Keep up the great work!