We’re very happy to report that we recently published a preprint on AHA, an ‘Artificial Hippocampal Algorithm’ for Episodic Machine Learning. It’s the culmination of a multi-year research project and is a starting point for the next wave of developments. This article describes the motivation for developing AHA and a brief description of the research.
Note: headline image from https://www.news-medical.net/health/Hippocampus-Functions.aspx
Recent Success of Machine Learning (ML)
There’s a lot of excitement about the recent successes of AI powered by machine learning. It has been applied to great effect across many problem domains. However, some experts have been vocal about the limitations of mainstream ML and in particular Deep Learning (examples here, here and here).
The mainstream solutions are predominantly based on slow statistical learning. They require a very large amount of data, usually labelled, and need retraining rather than incremental learning to build on existing knowledge. Additionally, there is an assumption that the data source, and hence environment, is unchanging i.e. samples are typically assumed to be i.i.d (independent and identically distributed).
In contrast, animals learn new concepts quickly in dynamic environments without re-learning of existing knowledge.
Biological learning has a range of desirable characteristics:
- Ability to learn and reason about specific instances, even if they are very similar
- Sample efficient learning, e.g. one-shot, or learning from only one experience
- Generalisation to other experiences
- Recognition of complete entities robustly from partial cues
- Short-Term Memory of recent experiences
- Learning without labels
- Robustness to imperfect sensory input (e.g. presence of noise and occlusion)
- Continual learning of new knowledge, re-using existing concepts
- Selective retention of salient knowledge as Long Term Memory
These are desirable qualities for many systems. For example, it is difficult and costly to gather huge labelled datasets for current industrial applications. But more than that, replicating these capabilities is essential for robots to finally be capable of operating in realistic cluttered and dynamic environments like the home.
It is difficult to imagine operating in the world without the ability to learn quickly enough to recall the specifics of the day, distinguish between similar experiences and objects and use new knowledge immediately. Imagine seeing every cup as a category rather than the cup that has your coffee! These skills also enable one to tell an autobiographical story and have a sense of identity, and by extension, a sense of responsibility – critical for ethical AI systems.
The central capability is to learn a ‘unique combination of concepts’ immediately. Learning with one example is called ‘one-shot learning’. If you include yourself as one of those concepts, it enables you to tell an autobiographical story from your unique series of experiences. This is known as Episodic Learning, and is why we talk about contributing to Machine Episodic Learning.
Bringing it together
It is widely accepted that the mammalian brain region known as the Hippocampus is essential for Episodic and Semantic Learning (expanded upon in a recent post). In this work, we set out to better understand and replicate the hippocampus to achieve the range of animal learning qualities described above. DeepMind has also promoted this direction recently.
Some concrete examples
You know about forks and spoons already through millions of varied experiences. Imagine that somebody now shows you a ‘spork’. You only need to see it once, because you’re familiar with the high level concepts that make up spoons and forks, and you learn this new combination. From now on, you’ll be able to generalize and recognise other sporks in a variety of conditions.
You and your friend receive coffee in mugs from the same mug set. You’ve never seen them before and they are almost identical, except for a stain, the position they were put down, the height of the contents or some other small detail. Despite this, it would be easy for you to know which one is yours.
Building better AI
The hippocampus is understood to learn quickly and retain knowledge over short time spans in the order of days. During that time it is used for recognition as well as to selectively consolidate memories into the neocortex, which performs slow statistical learning. The hippocampal region and neocortex appear to be complementary structures that together enable the range of flexible learning we observe in animals.
This defines a high level framework for enhancing standard machine learning models. These standard models comprise the long term memory, capable of slow statistical learning of categories.
The fast learner creates very distinct representations so that they don’t interfere, allowing separation of similar concepts. Moreover, the representations are highly compressed, which makes it practical to store experiences consisting of multiple high dimensional sensory streams.
The system can reconstruct the high dimensional inputs for:
- Recognition: when exposed to something similar, reconstruct the original stimulus for that AHA moment, after only one exposure.
- Consolidation: replay memories to the long term memory to learn the salient information and forget the rest
Why is it difficult?
As mentioned above, to learn an episode in one-shot, you must create distinct and non-interfering patterns. You must exaggerate slight differences in the input. But the goal of generalization (and hence recognition) is achieved by representing similar things in similar ways. These capabilities are seemingly at odds.
How is it done?
In our paper, we created a descriptive biological computational model based on the most appropriate models in the literature, CLS and work by Rolls – seminal studies with simple test cases. We then built an implementation using ML components while observing our biological plausibility constraints. The result is AHA, an ‘Artificial Hippocampal Algorithm’.
The Hippocampus utilises 3 pathways:
- Pattern Separation (PS): generates distinct non-interfering representations to distinguish between similar experiences.
- Pattern Completion (PC): recognizes and creates complete patterns from partial cues
- Pattern Mapping (PM): reconstructs original complete patterns in grounded form
Memory Storage (equivalent to training in a conventional model)
We see the PS pathway as converting non-symbolic representations to symbolic form for memory storage. The symbols are used for a form of self-supervised learning in the PC pathway to recognise future examples. The PM pathway learns to map from those symbols to the grounded non-symbolic form.
Memory Recall (equivalent to inference in a conventional model)
The PC pathway maps non-symbolic input to the symbolic representation. It contains an autoassociative memory for robust recall to crisp symbols. The PM then reconstructs the original input.
The experiments are based on a one-shot classification test made famous by Lake using Omniglot, a dataset of handwritten characters from a range of alphabets. “Compared to other common concepts, handwritten characters are simple and tractable enough to hope that machines, in the near future, will see most of the structure in the images that people do. For these reasons, Omniglot is an ideal testbed for developing more human-like learning algorithms” (Lake 2019).
Many researchers have tested their algorithms with this experiment. It forms a really good base as a benchmark for one-shot learning, for comparison and extension. We extended it to test some of the other target capabilities.
There are two main parts to the experiments.
One-shot classification (from Lake 2015):
- First a ‘memorize’ set of handwritten characters is presented e.g. a, b, c.
- Second, a ‘recognize’ set is presented. It consists of the same characters, handwritten by a different person e.g. b’, a’, c’.
- The system finds matching characters. e.g. the 1st character in ‘memorize’ matches the 2nd character in ‘recognize’. The system must be able to generalise and find a different version of the same character, despite only ever seeing one version of it.
Instance one-shot classification:
- This is the same, except that every character in the set is of the same letter e.g. a, a’, a”.
- The task is still to match corresponding characters, but now the system must be able to distinguish between very similar examples.
In both experiments, we conducted tests where we added different levels of noise and occlusion. Below is a visual example of each experiment. This is the case of added occlusion.
Instance one-shot classification
|Row 1: the memorised characters|
|Row 2: the separated symbolic representations|
|Row 3: test characters shown to the system|
|Row 4: the first recall of symbol|
|Row 5: a crisp version from the autoassociative memory|
|Row 6: the reconstructed input (the AHA moment of recognition)|
In these examples, the characters are occluded by a white dot the same colour as the background. AHA was able to generalise to different exemplars and recall the corresponding sample most of the time. In some cases, such as columns 2, 5 and 6 in the one-shot classification test, the recall is correct despite the fact that the occlusion has significantly affected the topology of the character. When it failed, for example column 8, it did so gracefully recalling a similar example with a curved bottom and interior dot.
In the instance one-shot classification test, AHA also performed well despite significantly disruptive occlusion. It was able to distinguish extremely similar exemplars whilst generalising over variations caused by occlusion.
AHA showed a range of learning abilities in a unified framework. You can find all the details in the preprint. We’re optimistic that it is an important step toward more animal-like learning for AI, and we’re keen to extend it. Watch this space.
This is a very intriguing concept. How does it tie in with the slow learning process? Something similar to a wake-sleep algorithm, perhaps?
Distinct observations are memorized in the fast learning Short Term Memory (AHA). Then they can be replayed many times to the LTM, which learns with a slow learning process.
It does have similarities to the wake-sleep in the sense that the network adjusts weights without outside supervision or input.