Hippocampal-style Replay for Continual RL (2021-)
This project takes a different approach to RL, inspired by evidence that Hippocampus replays to the frontal cortex directly. It is likely used for model building, as opposed to the mainstream view in cognitive science and ML – where ‘experience replay’ ultimately improves policy. We hypothesise that this approach would be more sample efficient, better able to generalise to new tasks and have higher plasticity whilst maintaining stability. We will apply this approach to advance the state-of-the-art continual and few-shot RL within a single architecture.
Our architecture is based on World Models, by Ha and Shmidhuber 2018, and we’re starting with the same game, Car Racing, shown in the image above.
Executive control with Biological Model-free and Model-based RL (2022-)
In this project we will build a novel RL architecture inspired by the brain, with a world model representing the Frontal Cortex, gated by a model-free algorithm representing the Basal Ganglia. Together the agent has the benefits of model-free and model-based approaches, overcoming limitations of each. The functional connectivity between Basal Ganglia and Frontal Cortex is based on the PBWM model for working memory by O’Reilly and Frank.
The objectives are twofold:
- explain aspects of executive control better than Model-Based, Model-Free or other approaches of combining them, with implications for computational psychiatry
- show favourable properties for RL in AI (sample efficiency, better ability to generalize)
Why do brains have 2 hemispheres? And can it improve Deep Learning and AI? (2021-2022)
Hippocampal model for structural and temporal generalization (2021-2022)
The interaction between hippocampus and frontal cortex is a crucial ingredient for animal cognition. It underpins our ability to learn from one experience, and immediately generalise both structurally and temporally. In this project we extend our previous AHA model with elements of Schapiro’s model to develop a model that can learn specifics fast, and perform both structural and temporal generalization. We are writing it up now (Sep 2022).
Continual Few-Shot Learning of Episodes in Time (2020-2022)
This project aims to develop our AHA software model of the Complementary Learning Systems (CLS) paradigm so that it can memorize and recall episodes in time. Whereas the original AHA model memorizes and forgets a minibatch of samples every time it is used, this modified version will memorize instantly and forget slowly, allowing time for selective replay and consolidation into long-term memory. The model will be tested on video scenes from the OpenLORIS mobile robot scene dataset.
The project developed into Continual Few-shot Learning. The use of replay to improve learning of general classes and specific instances, after only a few examples. See the preprint.
Zero-Shot Agent Navigation with Hindsight Experience Replay (2020-2022)
We will extend the Hindsight Experience Replay concept from manipulation of objects to navigation of a mobile robot Agent. We will also attempt to use policy learning models that can function in correlated-online-sampling conditions, so that a single mobile Agent can learn from its individual experiences. Combined with our Episodic memory system for fast few-shot or even one-shot learning, the Agent will be able to adapt rapidly to a continually changing environment. Finally, by adding the ability to plan in a mental simulation, the Agent should demonstrate zero-shot navigation – successfully planning & executing a path it has never experienced. The project is currently paused until we have a larger team.
Predictive Generative Models (2019-2020)
This project aims to develop a predictive model – such as our RSM – into a self-looping generative model that could be used to create simulations of future events. We’ve begun testing the model on the “bouncing balls” task described in “Predictive Generative Networks” (Lotter et al 2015), and RAVDESS, a database of actors singing, speaking and emoting.
This project was a crucial step towards our long-term goal of developing neural models of mental simulation.
Reinforcement learning of attentional strategies (2019-2020)
Recently Transformers and other attentional neural architecture have shown great promise in a number of domains. How can we align the deep backpropagation used in Transformers with our desire to use biologically plausible local credit assignment? We aim to use the Bellman equation – better known as discounted Reinforcement Learning – to learn attentional strategies for controlling a set of RSM memory “heads” such that their predictive abilities are maximized.
Learning distant cause & effect with only local credit assignment (2018-2019)
In May we introduced our Recurrent Sparse Memory (RSM). We showed that it can learn to associate distant casuses and effects, higher-order and partially observable sequences, and also tested it on natural language modelling. RSM is demonstrated to possess capabilities that previously required deep backpropagation and gated memory networks such as LSTM, while using more computationally and memory efficient local learning rules.
Episodic Memory (2018-2019)
Imagine trying to accomplish everyday tasks with only a memory for generic facts – without even remembering who you are and what you’ve done so far! That is the basis for most AI/ML algorithms.
We’re developing a complementary learning system with a long term memory akin to Neocortex and a nearer term system analogous to the Hippocampi.
The objective is to enable “Episodic memory” of combinations of specific states, enhancing the learning and memory of ‘typical’ patterns (i.e. classification). In turn enabling a self-narrative, faster learning with less data and the ability to build on existing knowledge.
Continuous online learning of sparse representations (2017-2018)
This project was the foundation of our approach to learning representations of data, with ambitious criteria – continuous, online, unsupervised learning of sparse distributed representations, resulting in state-of-the-art performance even given nonstationary input. We reviewed a broad range of historical techniques and experimented some novel mashups of older competitive learning and modern convolutional networks. We obtained some fundamental insights into effective sparse representations, and how to train them.
Predictive Capsules (2018)
We believe that Capsules networks promise inherently better generalization, a key weakness of conventional artificial neural networks.
We published an initial paper on unsupervised sparse Capsules earlier this year, extending the work of Sabour et al to only allow local, unsupervised training, and arguably obtained much better generalization. We are now developing a much better understanding of Capsules and how they might be implemented by Pyramidal neurons.
Since we ended this project, Kosiorek et al have developed a better version of the same ideas called “Stacked Capsule Autoencoders”. Their results are better too!
Alternatives to BackPropagation Through Time (BPTT)
We are intensely interested in biologically plausible alternatives to backpropagation through time (BPTT). BPTT is used to associate causes and effects that are widely separated in time. The problem is that it requires storage of partial derivatives for all synaptic weights for all time-steps up to a fixed horizon (e.g. 1000 steps). Not only is this memory intensive, the finite time window is very restrictive. There is no neurological equivalent to BPTT – nature does it another way, which we hope to copy.
This project led to the development of our RSM algorithm.