Continual Few-Shot Learning with Hippocampal Replay

seahorse

In continual learning, the neural network learns from a stream of data, acquiring new knowledge incrementally. It’s not possible to assume an i.i.d. stationary dataset available in one batch. Catastrophic forgetting of previous knowledge is a well known challenge. A wide variety of approaches fall broadly into 3 categories [6]: Regularization-based methods, Parameter isolation methods and Replay methods which are inspired by Hippocampal replay. Surprisingly, there have been no techniques created specifically to address continual few-shot learning (CFSL).

Few-shot learning is usually a distinct area of ML. In few-shot learning, the standard framework [5, 8] involves a feature extractor trained on a large dataset. Then in an evaluation phase, few samples of novel class are given as training examples and the task is to correctly identify samples of the same class from a test set, making it a ‘class-matching’ task. We claim 2 major limitations with this paradigm. Firstly, knowledge of novel classes is not assimilated into the network for the long term, which is necessary for continual learning. Secondly, classification only relates to generalised classes, but reasoning about specifics is a crucial feature of animal learning. For example, you only need to be shown a bus once to be able to identify busses in general (classes), and the specific bus that you need to catch, even if it is similar to other busses (specifics). It is easy to see how this capability applies across domains from autonomous robotics to financial risk modelling.

The way in which animals learn can provide important principles for ML. An important brain region for memory and learning is the Hippocampal Formation (HF). The standard theory, Complementary Learning Systems (CLS) [4], describes how the Hippocampus and Neocortex interact to achieve fast continual learning. The hippocampus learns highly distinct representations from 1 experience. It replays these patterns to the neocortex, which is a slow statistical learner like a conventional artificial neural network. The replay is interleaved, which allows consolidation of memories without causing catastrophic interference. 

Antoninou et al. [1] aimed to combine continual learning with few-shot learning by introducing the Continual Few-Shot Learning (CFSL) framework, that parameterised the continual learning problem and allows you to define each task using a handful of parameters, including: the number of support sets (NSS), how many support sets before class changes (CCI), number of classes (n-way), and number of exposures (n-shot).

Although the CFSL framework itself is flexible and scalable, the experimental results did not go far enough to bridge the gap between few-shot learning and continual learning. For instance, in continual learning problems, there is typically a much larger number of classes and support sets. However, the experiments were limited to only 5 classes and a maximum of 10 support sets.

Visual representation of the four continual few-shot task types in the CFSL framework of Antoniou et. al [1].

1. Objectives

The core objective of our project is to tackle the challenge of continual learning with very few samples, i.e. combining continual learning with few shot learning. We aim to achieve this by building on the CFSL framework, and drawing on our prior work in memory replay and consolidation to augment standard models with a short-term memory.

The CFSL framework [1] itself should be extended by testing in a way that would be comparable to existing continual learning literature. We aim to do this by selecting a few representative baseline algorithms and scaling the existing experiments from the original benchmark, but with a larger number of classes and support sets. Similarly to our previous work [2], we would also test the ability to recognise a specific exemplar, an essential characteristic in animal learning. For instance, recognising a specific mug amongst a bunch of other mugs.

In addition to scaling the experiments, we are also interested in measuring the degradation of performance over time with the addition of more classes as models iterate through one support set at a time. This can be achieved by constructing a number of scaled experiments at different intervals (e.g. an experiment with 8 and 80 support sets) and measuring the degradation in the overall performance.

Our hypothesis is that these models’ performance will significantly deteriorate over time, and that augmenting these models with a short-term memory (such as AHA, or simple buffer) where previous experiences are interleaved with newer experiences, can alleviate catastrophic forgetting and incorporate the learned knowledge over time through memory consolidation. This is inline with our most recent work [3], blog here, where we demonstrated that AHA can be utilised to complement a conventional machine learning classifier, by quickly learning new classes and replaying them back to the classifier for consolidation.

2. Preliminary Work 

Some preliminary work was done towards meeting the objectives. In our work thus far, we have chosen Pre-trained VGG, Proto-Nets and SCA from the original paper as our representative algorithms for the experiments. This was largely due to time and computing resource constraints.

2.1 Replication Experiments

We identified and fixed a couple of bugs within the CFSL framework and committed them back upstream to the original project on GitHub FewShotContinualLearning. Given the significance of the patches and how it affected the overall performance of specific algorithms, we aimed to also produce a replication of the original experiments on a limited number of tasks using the same set of representative algorithms.

2.2 Scaling Experiments

We made some progress on completing the scaling experiments. We designed the experiments to test two things independently: increased number of classes per support set and increased number of support sets. We ensured that the total number of classes is always the same, but what changes is the manner in which they are presented. For each experiment type, we conducted two tests where the number of classes was set to 20 and 200 respectively.

All scaling experiments were done using the parameters of Task D as described in the original CFSL benchmark [1]. In Task D, the same class is used across 2 support sets but with different instances of that class.

Wide Experiments

The ‘wide’ experiments consist of a small number of support sets, but with more classes in each support set. In the first experiment, the model was presented with 10 classes per support set over 4 support sets. In the second experiment, the model was presented with 100 classes per support set  over 4 support sets.

Long Experiments

The long experiments consist of a small number of classes in each support set, but with a larger number of support sets. In the first experiment, the model was presented with 5 classes per support set over 8 support sets. In the second experiment, the model was presented with 5 classes per support set over 80 support sets.

3. Future Work

The next step in our work is to integrate AHA with the CFSL framework and replicate the experiments discussed so far. Our approach will be to augment the baseline VGG model with AHA utilised as its short-term memory module. This will allow us to test the feasibility of hippocampal replay in this setting and how it affects performance deterioration with the scaling experiments. This will also involve various improvements to AHA itself to facilitate working with larger batches.

There are also some challenges In the scaling experiments, where we encountered scalability and resourcing issues, particularly with large scale models such as SCA. We were unable to successfully complete the larger variant of each experiment type for the SCA model. The experiments for the VGG baseline and ProtoNets were achievable with minimal computational resources. We hope to either address these computational limitations, or find a suitable meta-learning model alternative to complete these experiments.

References

[1] A. Antoniou, M. Patacchiola, M. Ochal, and A. Storkey, “Defining benchmarks for continual few-shot learning,” arXiv preprint arXiv:2004.11967, 2020. 

[2] G. Kowadlo, A. Ahmed, D. Rawlinson, “Unsupervised one-shot learning of both specific instances and generalised classes with a hippocampal architecture,” in Australasian Joint Conference on Artificial Intelligence, 2020.

[3] G. Kowadlo, A. Ahmed, and D. Rawlinson, “One-shot learning for the long term: consolidation with an artificial hippocampal algorithm”, in International Joint Conference on Neural Networks (IJCNN), 2021

[4] D. Kumaran, D. Hassabis, and J. L. McClelland, “What Learning Systems Do Intelligent Agents Need? Complementary Learning Systems Theory Updated,” Trends in Cognitive Sciences, vol. 20, no. 7, pp. 512– 534, 7 2016. 

[5] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332–1338, 2015.

[6] M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “Continual learning: A comparative study on how to defy forgetting in classification tasks,” arXiv preprint arXiv:1909.08383, vol. 2, no. 6, 2019.

[7] A. C. Schapiro, N. B. Turk-Browne, M. M. Botvinick, and K. A. Norman, “Complementary learning systems within the hippocampus: a neural network modelling approach to reconciling episodic memory with statistical learning,” Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 372, no. 1711, p. 20160049, 2017. 

[8] O. Vinyals, C. Blundell, T. Lillicrap, and K. Kavukcuoglu, “Matching Networks for One Shot Learning,” in Advances in neural information processing systems, 2016.