The diagram above shows our research roadmap for the coming year (July 2020-June 2021). You can compare to last year’s diagram (below) to see our progress. Red filling indicates completion; yellow indicates areas we expect to make progress this year.
The map has two principal stages – improved representations, and generating behaviour using those representations. At the “Representation” end, our research is divided into two streams – Semantic and Episodic memory. Together, these learned representations provide the necessary flexibility for many tasks. The end goal is a “General Purpose Agent” capable of efficiently learning to perform many tasks in continually changing environments. The agent will be able to use the same learning algorithms, representations and control systems in many applications and environments. To achieve this, we aim to develop an agent that uses a mental simulation to learn and plan. This year we will focus on core technologies that support this capability. There’s lots of jargon here, which is explained in more detail down in the page.
It goes without saying that the final result will not be a superhuman AGI, but rather a software agent that has more general capabilities than current agents. Many additional areas of research necessary for AGI are not even mentioned – such as learning by imitation, transfer learning by language, and improvements in ML technology more broadly. This roadmap has been chosen to address the questions we believe are most relevant to our interests, while also having a reasonable chance of leading to a functional result in the near-term.
Review of progress to date
We recently met to plan our research activities for the coming year. Our previous research programme had two main strands:
- Artificial Hippocampal Algorithm (AHA) : A computational model of the complementary memory system as described by Rolls et al  and O’Reilly . The broad objective of this project was to improve on the limitations of a single type of memory common in conventional ML, and create a bio-inspired heterogeneous Complementary Learning Systems (CLS) architecture. The many capabilities are often characterised as Episodic memory – remembering and learning from a specific experience. We demonstrated that the architecture could simultaneously perform one-shot learning of specific instances, and generalizable classes, and that the one-shot learnt representation could be consolidated to a model for long term retention.
- Recurrent Sparse Memory (RSM) [4,5]: A biologically-plausible neocortical sequence memory, capable of competing with conventional deep-backpropagation (deep-BP) and backpropagation-through-time (BPTT) models such as LSTM, using only local and immediate credit assignment.
Both primary research outputs AHA and RSM use strictly local and immediate credit assignment, making them significantly more biologically-plausible than conventional ML ANN approaches that rely on deep-BP over time and layers.
RSM is comparable to SOTA benchmarks on several sequence learning tasks, with some slightly better and some slightly worse results. One identified weakness is generalization to unseen sequences, where it appears to underperform compared to e.g. LSTM and particularly Transformer architectures. We aim to improve RSM in this regard using a multi-head attention variant of RSM, which will be one of the projects in the coming year.
AHA is comparable to conventional ANN architectures at one-shot class-generalization using the Omniglot dataset (the “Lake” test ). Other published models that exploit priors concerning handwriting via stroke formation demonstrate significantly better performance on this benchmark, but we feel our objective has been met: AHA has been shown to complete our extended benchmark of one-shot learning tests to a high standard, demonstrating one-shot instance-learning and both instance and class generalization under occlusion and noise conditions.
Our FY2019 – 2020 research roadmap is depicted above, including progress as of August 2019. In the past year we performed our first experiments with mental simulation technology, and made great progress with the memory representations that underpin it. However, the underlying approach remains unchanged.
Our primary mid-term objective is to create an artificial Agent that can demonstrate correlated-online, continual, few-shot learning of specific instances and general classes. Let’s unpack what that means:
- Agent: We aim to develop a system that interacts with its environment, such as a mobile robot that navigates in a purposeful way.
- Correlated Online Sampling (COS): If learning is to occur in an agent embodied in the real world, it needs to learn from a single sequence of highly correlated samples (e.g. successive video frames will be similar to each other as the robot moves through a space). This is in contrast to most ANNs trained by Stochastic Gradient Descent (SGD), which require a large number of i.i.d. samples. Our definition of COS is that samples are presented one at a time, with a minibatch size of 1 (if SGD is used). We allow SGD with minibatch size > 1, if COS input is stored within a memory buffer accounted for as part of the agent’s memory, not part of the environment.
- Continual: If the environment is constantly changing, the Agent must learn continually. The hard part about this is not forgetting what you already knew but haven’t seen recently, and assimilating the new data from a single online stream.
- Few-Shot: In the real world we generally only get a few samples of a new concept before we have to start using that experience for goal-directed tasks. This is known as few-shot learning. In the extreme, we might want to learn from a single experience (One-Shot).
- Instances: Humans have the ability to reason about both specific instances (e.g. my cup, or your cup) and general classes (cups, plates, bowls etc.). Both types of reasoning are essential aspects of Agent behaviour for sane interactions with people – or at the minimum good etiquette!
Working with Correlated Online Sampling (COS)
There are a number of ways that models trained with SGD can be applied to problems that seemingly imply COS.
Prior knowledge: Approaches such as SLAM are effective for robot mapping of novel environments, sensed in COS conditions. However, although the model is being learned, the ability to perform the mapping task is developed a-priori and hardcoded into the agent.
Learning to Learn: Another approach is “learning to learn” – the model is trained by SGD on a pre-existing superset of all conditions it might experience, and then rapidly adapts this prior knowledge to the new COS data in the final environment. The pre-learned model is relatively static. The limitation here is that the superset must include all meaningful variation that would be experienced in the wild.
Make it IID: Alternatively, we can use techniques such as an Experience Replay buffer to accumulate COS data until there’s enough for IID sampling, and then learn. This will still suffer from some correlated sampling effects, and it may be very difficult to achieve enough meaningful exploration without already learning.
Don’t learn: We can pre-train the model using SGD for all conceivable input, and not train during operation. This is also surprisingly common.
We allow limited behavioural adaptation (e.g. Hebbian) in response to user preferences, but harder, “intelligent” functions are pre-trained entirely.
The common weakness of these alternatives is they don’t allow an agent to adapt to situations not considered by its creators. For example, a robot companion at an end user’s home should adapt to the user’s habits and preferences. These alternatives do not allow open-ended learning after deployment of the agent to the operating environment.
It’s important that we measure our algorithms in ways that can be compared to existing literature. We aim to show that we can achieve comparable performance to state-of-the-art approaches, but with our self-imposed biological-plausibility criteria and demonstrating additional new capabilities.
We may also define new benchmarks that cover a wider range of capabilities than seen in the literature, but since these will not have existing performance baselines we still need the established benchmarks for comparison to other methods.
Since we have already explored one-shot learning – to an extent – the most difficult tasks in our new agenda are the “continual, few-shot” learning requirements. The most popular benchmarks in this research area are image classification tasks. A great resource for this emerging community is the Continual AI group and in particular their Slack group.
There is a rich and growing body of work on both few-shot learning and continual learning, but very few examples of work that combines both. Recently Antoniou et al  described a framework that combines them into ‘Continual Few Shot Learning’ (CFSL) and compared multiple baseline few-shot learning architectures, demonstrating their strengths and weaknesses on CFSL. The CFSL framework is a superset of few-shot and continual learning and is flexible, systematic and descriptive. The main variables are the number of examples of a class, and the number and timing of continually introduced classes, thus defining a spectrum from rapid few-shot continual learning to long-term continual learning. The framework describes evaluation metrics, including:
- Generalization of classification accuracy to disjoint sets
- Storage / memory requirements
- Computational cost
Antoniou et al specify two baseline datasets, of low and high complexity respectively:
LOW: Omniglot (handwritten characters)
HIGH: SlimageNet64 “Slim ImageNet”
In their framework, all continual learning classification tasks are defined by the way new classes (e.g. types of animal) and new instances (e.g. new images of a known type of animal) are introduced. This covers the 3 accepted cases for object recognition  – New Instances (NI), New Classes (NC), and New Instances and Classes (NIC). It can be seen that a single Agent exploring a new world would experience an NIC-type task.
These tasks are further categorized depending on the rate and schedule for the introduction of NI and NC. Typically, introduction is “incremental” meaning that samples are gradually introduced over a period of time. Samples may also be removed, meaning they are no longer presented, but might still occur in the test set. As you can see there are many variables to consider.
These cases fall within a “Single Incremental Task” categorisation, in contrast to “Multi-Task”, which differs by not overlapping the presentation of old and new classes, and offering signals to the algorithm on task change (which enables e.g. allocation of new memory resources, or changing the behaviour of the memory). Other related areas include Domain Adaptation, Transfer Learning, Learning to Learn, and Open World Learning. These all involve similar but distinct questions – for example, learning to learn involves improving the quality and speed of future learning outcomes.
Video classification is an important stepping stone toward agent based CFSL. Although there are no video datasets used in the CFSL paper, video classification can also be captured within the CFSL framework, with the added dimension that we have a natural ordering of image samples that is highly correlated and may be problematic for many algorithms.
From a Hippocampal perspective, videos correspond more closely to the widespread understanding of Episodic Memory being sequences in time experienced by an Agent. There are two existing benchmarks that we’re interested in. OpenLoris-Object and OpenLoris-Scene, both aiming to advance research into learning in continually changing environments.
In OpenLoris-Object, a variety of sensor data was collected with a mobile robot platform, and the task of recognizing objects in realistic environments despite variation in illumination, object occlusion, object size, camera-object distance/angles, and clutter. The videos focus on an object as it is picked up and manipulated so that it can be seen from different angles. This dataset was used for the “IROS 2019 Lifelong Robotic Vision Challenge” . In OpenLoris-Scene, the robot moves about the environment.
We plan to start with OpenLoris-Object for the first stage of CFSL with video. We would then use OpenLoris-Scene for CFSL in the context of navigation.
Modelling Episodes in Time
In addition to object recognition in video, we can expect our model to do more with video-episodes. For example, we should obtain the ability to remember the sequence of images and replay it. We can test this explicitly, but we have not yet defined specific procedures for measuring the quality of video-recall (replay).
Ultimately we aim to measure behaviour in a robot agent, which makes navigation one of the most obvious tasks, as it doesn’t require any additional capabilities beyond the most essential! How can we demonstrate that our navigating agent can do online, continual, few-shot learning of instances and classes?
The answer is we can place the robot in a simulated environment, and allow it to explore and learn how to navigate. We can teleport it to different start positions, and reward it for being able to navigate to specific goal locations. Subsequently, we can open a “door” to a new part of the map, and allow it to explore once. Thereafter, the robot should be able to navigate from any part of the map to a specified goal in the new area, which would constitute one-shot learning from a single online episode.
A good simulated robot environment might be the “Gibson Environment”. This is a dataset recorded from the real world consisting of spatial structure and appearance of many real homes. This level of detail allows both active and passive robot sensors to be simulated and for repeatable trials of mobile robot navigation in a realistic, cluttered domestic environment.
The most comparable existing benchmark for few-shot navigation in a continual-learning framework are some of the experiments used to evaluate the MERLIN architecture (Wayne et al, 2018) . In their paper “Unsupervised Predictive Memory in a Goal-Directed Agent”, Wayne et al described a variety of “Goal Finding” tasks in which an environment is modified over time. Some modifications affect localization or perception only; some are crucial to the task. In some tasks, distant objects in the skyline are constant, and the goal “object” is also constant. In another variant of the task “proximate” objects are constant, but the distant ones are missing. There are variations in door open/closed status and environment size. Agent movement within the environment is continuous, providing an added element of difficulty. To measure success, Wayne et al say: “On arriving at the goal, the agent will be teleported at random in one of the rooms, and must find the goal again. Returning to the goal faster on subsequent teleportations is evidence that the goal location has been memorised [few-shot] by the agent.” They explicitly measured the agent’s ability to rapidly learn new navigational policies in the changing environment.
Our approach contains a fast-learning memory that can replay stored information for a more slowly learned memory. Therefore, it falls into the category of “replay” based methods of continual learning. The fast memory necessarily forgets more quickly, otherwise it would rapidly become full. The slow memory learns and forgets slowly. This is in accord with Complementary Learning Systems (CLS) (Rolls et al 2013)  (see figure below).
Other approaches to continal learning are shown in the figure below, which is reproduced from Lange et al (2019) . Our approach is based on CLS, which is a Replay-Rehearsal approach in this taxonomy.
We can immediately exploit a fast-learning short-term memory (STM) by interpolating between the inferences of the STM and the slow-learning, long-term memory (LTM). We have obtained preliminary results showing that this works, to some extent. Gideon will present some of this work at NAISYS 2020.
A more sophisticated approach is to use the STM to recall & replay recent, salient memories to the LTM so they can be permanently learned. This is the essence of the “replay” approach to CFSL. We can vary the prevalence of the replayed memories continuously, for optimum results. Existing research supports use of this strategy. It also matches current understanding of the biological function of the Hippocampus, which can reinstate Neocortical patterns via the Entorhinal cortex.
Specific Research Activities
Our memory systems will be based on the CLS approach we already adopted for AHA, but incorporating new features and methods we will develop ourselves, or adapt from existing research. We will research and develop new capabilities and measure the results in a variety of existing and new benchmarks.
Our research starts from 3 existing technologies, with which we have already experimented to some extent. First, our existing AHA model of the CLS . We have already developed and demonstrated the ability to recall and replay sharp, crisp, high-quality memories learned one-shot.
We will further test recall and replay in the new CFSL experiment framework so that it can be compared to existing state-of-the-art methods.
Separately, we will develop a new memory system that can learn under COS, few-shot conditions . Our most promising contender for this is the Growing When Required (GWR) network, which in recent publications has shown more compelling performance  and learns using a Hebbian rule – local and immediate, one sample at a time. Learning has been shown to not catastrophically interfere with existing memories.
Using the new memory, we will create a new version of AHA that does not forget between episodes, although it will still learn quickly and forget quickly. We will also exploit this capability to quickly, few-shot learn episodes over time, comprising many images. This is the necessary prior technology for application to a mobile robot.
For navigation, we aim to integrate these memory systems with Hindsight Experience Replay (HER) . HER has already been demonstrated to learn to move objects to arbitrary positions; it seems likely the same method can learn to navigate a mobile robot, possibly in the Gibson Environment. We will use the GWR label-learning technique (see ) to learn Q-values for use in HER, enabling learning to be online but still neural – more compressed than a simple Q-table.
There are some other research areas we anticipate we may need to explore. These include learning more stable, quasi-symbolic representations for use in the STM. Possible approaches include use of more stable, disentangled and equivariant representations – perhaps via use of VAEs – and using attention to produce stable representations of objects of interest while ignoring background clutter.
- Gideon Kowadlo, Abdelrahman Ahmed, David Rawlinson: “AHA! an ‘Artificial Hippocampal Algorithm’ for Episodic Machine Learning” Arxiv Preprint (2019) https://arxiv.org/abs/1909.10340
- Rolls, E.T.: The mechanisms for pattern completion and pattern separation in the hippocampus. Frontiers in Systems Neuroscience 7(October), 1–21 (2013) https://www.frontiersin.org/articles/10.3389/fnsys.2013.00074/full
- Randall C O’Reilly, Rajan Bhattacharyya, Michael D Howard, Nicholas Ketz: “Complementary learning systems” Cogn Sci. 2014 Aug;38(6):1229-48. doi: 10.1111/j.1551-6709.2011.01214.x. (2011) http://psych.colorado.edu/~oreilly/papers/OReillyBhattacharyyaHowardKetz11.pdf
- David Rawlinson, Abdelrahman Ahmed, Gideon Kowadlo: “Learning distant cause and effect using only local and immediate credit assignment” Preprint (2019) https://arxiv.org/Arxiv/abs/1905.11589
- Jeremy Gordon, David Rawlinson, Subutai Ahmad: “Long Distance Relationships without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling” Arxiv Preprint(2019) https://arxiv.org/abs/1912.01116
- Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015) https://web.mit.edu/cocosci/Papers/Science-2015-Lake-1332-8.pdf
- Antreas Antoniou, Massimiliano Patacchiola, Mateusz Ochal, Amos Storkey: “Defining Benchmarks for Continual Few-Shot Learning” Arxiv Preprint (2020) https://arxiv.org/abs/2004.11967
- Vincenzo Lomonaco, Davide Maltoni: “CORe50: a New Dataset and Benchmark for Continuous Object Recognition” Proceedings of the 1st Annual Conference on Robot Learning, PMLR 78:17-26 (2017) http://proceedings.mlr.press/v78/lomonaco17a.html
- Qi She, Fan Feng, Qi Liu, Rosa H. M. Chan, Xinyue Hao, Chuanlin Lan, Qihan Yang, Vincenzo Lomonaco, German I. Parisi, Heechul Bae, Eoin Brophy, Baoquan Chen, Gabriele Graffieti, Vidit Goel, Hyonyoung Han, Sathursan Kanagarajah, Somesh Kumar, Siew-Kei Lam, Tin Lun Lam, Liang Ma, Davide Maltoni, Lorenzo Pellegrini, Duvindu Piyasena, Shiliang Pu, Debdoot Sheet, Soonyong Song, Youngsung Son, Zhengwei Wang, Tomas E. Ward, Jianwen Wu, Meiqing Wu, Di Xie, Yangsheng Xu, Lin Yang, Qiaoyong Zhong, Liguang Zhou: “IROS 2019 Lifelong Robotic Vision Challenge — Lifelong Object Recognition Report” Arxiv Preprint (2020) https://arxiv.org/abs/2004.14774
- Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap: “Unsupervised Predictive Memory in a Goal-Directed Agent” Arxiv Preprint (2018) https://arxiv.org/abs/1803.10760
- Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Ales Leonardis, Gregory Slabaugh, Tinne Tuytelaars: “A continual learning survey: Defying forgetting in classification tasks” Arxiv Preprint (2019) https://arxiv.org/abs/1909.08383
- Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, Wojciech Zaremba: “Hindsight Experience Replay” Advances in Neural Information Processing Systems 30 (2017) https://papers.nips.cc/paper/7090-hindsight-experience-replay.pdf
- German I.Parisi, Ronald Kemker, Jose L.Part, Christopher Kanan, Stefan Wermtera: “Continual lifelong learning with neural networks: A review” Neural Networks, Volume 113, May 2019, Pages 54-71 (2019) https://www.sciencedirect.com/science/article/pii/S0893608019300231
- German I.Parisi, Jun Tani, Cornelius Webera, Stefan Wermtera: “Lifelong learning of human actions with deep neural network self-organization” Neural Networks Volume 96, December 2017, Pages 137-149 (2017) https://www.sciencedirect.com/science/article/pii/S0893608017302034