The content will be a series of short articles about a set of common architectures for artificial general intelligence (AGI). Specifically, we will look at the commonalities in Deep Belief Networks and Numenta’s Memory Prediction Framework (MPF). MPF is these days better known by its concrete implementations CLA (Cortical Learning Algorithm) and HTM (Hierarchical Temporal Memory). For an introduction to Deep Belief Networks, read one of the papers by Hinton et al.
This blog will typically use the term MPF to collectively describe all the current implementations – CLA, HTM, NUPIC etc. We see MPF as an interface or specification, and CLA, HTM as implementations of the MPF.
Both MPFs and DBNs try to build efficient and useful hierarchical representations from patterns in input data. Both use unsupervised learning to define local variables to represent the state-space at a particular position in the hierarchy; modelling of the state in terms of these local variables – be they “sequence cells” or “hidden units” – constitutes a nonlinear transformation of the input. This means that both are “Deep Learning” methods. The notion of local variables within a larger graph relates this work to general Bayesian Networks and other graphical models.
We are also very interested in combining these structures with the representation and selection of behaviour, eventually resulting in the construction of an agent. This is a very exciting area of research that has not received significant attention.
Readers would be forgiven for not having noted any similarity between MPFs and DBNs. The literature rarely describes both in the same terms. In an attempt to clarify our perspective, we’ve included a phylogeny showing the relationships between these methods – of course, this is only one perspective. We’ve also noted some significant organisations using each method.
The remarkable uniformity of the neocortex
MPF/CLA/HTM aims to explain the function of the human neocortex. Deep Learning methods such as Convolutional Deep Neural Networks are explicitly inspired by cortical processing, particularly in the vision area. “Deep” means simply that the network has many layers; in earlier artificial neural networks, it was difficult to propagate signals through many layers, so only “shallow” networks were effective. “Deep” methods do some special (nonlinear) processing in each layer to ensure the propagated signal is meaningful, even after many layers of processing.
|A cross-section of part of a cerebrum showing the cortex (darker outline). The distinctively furrowed brain appearance is an attempt to maximize surface area within a constrained volume. Image from Wikipedia.|
Cortex means surface, and this surface is responsible for a lot of processing. The cortex covers the top half of the brain, the cerebrum. The processing happens in a thin layer on the surface, with the “filling” of the cerebrum being mainly connections between different areas of the cortex/surface.
Remarkably, it has been known for at least a century that the neocortex is remarkably similar in structure throughout, despite being associated with ostensibly very different brain functions such as speech, vision, planning and language. Early analysis of neuron connection patterns within the cortex revealed that it is organised into parallel stacks of tiny columns. The columns are highly connected internally, with limited connections to nearby columns. In other words, each column can be imagined as an independent processor of data.
Let’s assume you’re a connectionist: This means you believe the function of a neural network is determined by the degree and topology of the connections it has. This suggests that the same algorithm is being used in each cortical column: the same functionality is being repeated throughout the cortex despite being applied to very different data. This theory is supported by evidence of neural plasticity: Cortex areas can change function if different data is provided to them, and can learn to interpret new inputs.
So, to explain the brain all we need to figure out is what’s happening in a typical cortical column and how the columns are connected!!*
(*a gross simplification, so prepare to be disappointed…!)
Neural Networks vs Graphical Models
Whether the function of a cortical column is described as a “neural network” or as a graphical model is irrelevant so long as the critical functionality is captured. Both MPF and Deep Belief Networks create tree-like structures of functionally-identical vertices that we can call a hierarchy. The processing vertices are analogous to columns; the white matter filling the cerebrum passes messages between the vertices of the tree. The tree might really be a different type of graph; we don’t know whether it is better to have more vertices in lower or higher levels.
From representation to action
Deep Belief Networks have been particularly successful in the analysis of static images. MPF/CLA/HTM is explicitly designed to handle time-varying data. But neither is expressly designed to generate behaviour for an artificial agent.
Recently, a company called DeepMind combined Deep Learning and Reinforcement Learning to enable a computer program to play Atari games. Reinforcement Learning teaches an algorithm to associate world & self states with consequences by providing only a nonspecific “quality” function. The algorithm is then able to pick actions that maximize the quality expected in future states.
Reinforcement Learning is the right type of feedback because it avoids the need to provide a “correct” response in every circumstance. For a “general” AI this is important, because it would require a working General Intelligence to define the “correct” response in all circumstances!
The direction taken by DeepMind is exactly what we want to do: Automatic construction of a meaningful hierarchical representation of the world and the agent, in combination with reinforcement learning to allow prediction of state quality. Technically, the problem of picking a suitable action for a specific state is called a Markov Decision Process (MDP). But often, the true state of the world is not directly measurable; instead, we can only measure some “evidence” of world-state, and must infer the true state. This harder task is called a Partially-Observable MDP (POMDP).
An adaptive memory-prediction framework
In summary this blog is concerned with algorithms and architectures for artificial general intelligence, which we will approach by tackling POMDPs using unsupervised hierarchical representations of the state space and reinforcement learning for action selection. Using Hawkins et al’s MPF concept for the representation of state-space as a hierarchical sequence-memory, and adding adaptive behaviour selection via reinforcement learning, we arrive at the adaptive memory prediction framework (AMPF).
This continues a theme we developed in an earlier paper (“Generating adaptive behaviour within a memory-prediction framework”).
Since that publication we have been developing more scalable methods and aim to release a new software package in 2014. In the meantime we will use this blog to provide context and discussion of new ideas.