Is Class-Incremental Enough for Continual Learning?

Cossu, Andrea; Graffieti, Gabriele; Pellegrini, Lorenzo; Maltoni, Davide; Bacciu, Davide; Carta, Antonio; Lomonaco, Vincenzo

doi:10.3389/frai.2022.829842

PERSPECTIVE article

Front. Artif. Intell., 24 March 2022

Sec. Machine Learning and Artificial Intelligence

Volume 5 - 2022 | https://doi.org/10.3389/frai.2022.829842

Is Class-Incremental Enough for Continual Learning?

Andrea Cossu^1,2^*

Lorenzo Pellegrini³

¹Pervasive AI Lab, Computer Science Department, University of Pisa, Pisa, Italy
²Class of Science, Scuola Normale Superiore, Pisa, Italy
³Biometric System & Smart City Lab, Computer Science Department, University of Bologna, Bologna, Italy

The ability of a model to learn continually can be empirically assessed in different continual learning scenarios. Each scenario defines the constraints and the opportunities of the learning environment. Here, we challenge the current trend in the continual learning literature to experiment mainly on class-incremental scenarios, where classes present in one experience are never revisited. We posit that an excessive focus on this setting may be limiting for future research on continual learning, since class-incremental scenarios artificially exacerbate catastrophic forgetting, at the expense of other important objectives like forward transfer and computational efficiency. In many real-world environments, in fact, repetition of previously encountered concepts occurs naturally and contributes to softening the disruption of previous knowledge. We advocate for a more in-depth study of alternative continual learning scenarios, in which repetition is integrated by design in the stream of incoming information. Starting from already existing proposals, we describe the advantages such class-incremental with repetition scenarios could offer for a more comprehensive assessment of continual learning models.

1. Introduction

Continual learning models learn from a stream of data produced by non-stationary, dynamic environments (Parisi et al., 2019; Lesort et al., 2020). Since the data distribution may drift at any time, continual learning violates the i.i.d assumption behind traditional machine learning training procedures, giving rise to problems like catastrophic forgetting of previous knowledge (McCloskey and Cohen, 1989).

The issues faced by a continual learning model are heavily influenced by the specific implementation of the general continual learning scenario introduced above. In the last years, the most popular scenarios all refer to an experience-based way of learning in classification tasks (van de Ven and Tolias, 2018; De Lange et al., 2021). In these scenarios, learning is broken down into a, possibly unbounded, stream of experiences S = e₁, e₂, e₃, …, with abrupt and instantaneous drifts between one experience and the other (Lomonaco et al., 2021). Each experience e_i brings a set of data, together with optional additional knowledge, like a task label (usually, a scalar value) which helps to uniquely identify the distribution generating the current data (Lesort et al., 2020). The surge of interest in continual learning has been initially driven by its application to deep learning methodologies and mostly oriented toward supervised computer vision tasks, like object recognition from images (Li and Hoiem, 2016; Rusu et al., 2016). Naturally, one of the most intuitive procedures to convert available computer vision benchmarks into viable continual learning benchmarks consisted in the concatenation of multiple datasets to simulate drifts in the data distribution (one dataset per experience, as in the protocol used by Li and Hoiem, 2016). This immediately allowed to leverage the vast amount of existing computer vision benchmarks and to rapidly test new continual learning strategies on large-scale streams. The learning objective was to classify patterns by assuming to know from which dataset each pattern arrived (referred to as task-incremental learning scenario in the literature; van de Ven and Tolias, 2018). This task label information simplifies the continual learning problem, since patterns from different datasets can be easily isolated by the model during both learning and inference.

After a period in which task-incremental has remained the most studied continual learning scenario, the attention of the community has now turned to class-incremental scenarios (Rebuffi et al., 2017), where experiments are conducted on a single dataset, with patterns split by class and without any knowledge about the task label, neither during training nor during inference.

In the class-incremental setting each experience e_i contains a training dataset D_i = {(x_j, y_j)}_{j = 1, ..., M}, where x_j is the input pattern and y_j is its target class. The peculiar characteristic of class-incremental scenarios is that they partition the target class space by assigning a disjoint set of classes to each experience. Formally, be $C = {c_{k}}_{k = 1, . . ., C}$ the set of all classes seen by the model during training and be $C_{i}$ the subset of classes present in experience e_i, class-incremental scenarios satisfies the following condition:

\begin{array}{l} C_{i} ⋂ C_{j} = \emptyset, \forall i \neq j . & (1) \end{array}

We will refer to the constraint expressed by Equation (1) as the no repetition constraint. It simply states that classes present in one experience are never seen again in future experiences or, likewise, that each class is present in one and only one experience.

In this work, we discuss the reasons behind the success recently achieved by the class-incremental scenario and we highlight some of the problems connected with its assumptions and requirements. In particular, we believe that class-incremental scenarios impose a strong focus on catastrophic forgetting while, on the contrary, in many real-world scenarios such effect is less pronounced due to a natural repetition of previously encountered patterns. Class-incremental scenarios, although very convenient for quickly setting up experiments, possibly narrow future research paths by overshadowing other objectives like forward transfer (Lopez-Paz and Ranzato, 2017) and sample efficiency (Díaz-Rodríguez et al., 2018), fundamental requirements for the achievement of a true continual learning agent.

We do not advocate for the complete dismissal of the class-incremental scenario, which has been proven useful to spark the interest around continual learning and to foster the design of new solutions. Instead, we aim at promoting the usage of alternative continual learning scenarios, in which previously encountered patterns may be revisited in the future. We call this family of scenarios class-incremental with repetition and we show that some examples are already present in the literature, even though still understudied. We believe that class-incremental with repetition scenarios constitute a promising direction for a more robust and thorough assessment of continual learners' performance.

2. Class-Incremental Scenarios

The class-incremental learning scenario is nowadays very popular in the continual learning community. Its simplicity and ease of use greatly fostered new studies and efforts toward mitigating catastrophic forgetting, the main problem faced by models when learning in this setting. Other continual learning scenarios present in the literature include task and domain-incremental learning (van de Ven and Tolias, 2018), task-free and data-incremental learning (Aljundi et al., 2019b; De Lange and Tuytelaars, 2020), and online continual learning (Lopez-Paz and Ranzato, 2017; Aljundi et al., 2019a). Even though these scenarios propose alternative learning settings, none of them really address and discuss the assumptions and constraints that characterize the class-incremental scenario. In Section 3, we will present few notable exceptions (Stojanov et al., 2019; Lomonaco et al., 2020; Thai et al., 2021) that work on continual learning scenarios for classification based on repetition of previously seen classes.We now turn the attention to some of the limitations caused by the no repetition constraint of class-incremental.

Issue 2.1. Repetition occurs naturally in many real-world environments.

Class-incremental learning is not aligned to many applications in which repetition comes directly from the environment. Examples include robotic manipulation of multiple objects, prediction of financial trends, autonomous driving tasks, etc. A learning agent exposed to a continuous stream of information should be able to incrementally acquire new knowledge, but also to forget unnecessary concepts and to prioritize learning based on some notion of importance. Not all the perceived information should be treated equally: if a certain pattern never occurs again, it may be useless to still pretend to predict it correctly. In fact, the statistical re-occurrence of concepts and their temporal relationship could be considered as important sources of information to help determine the importance of what the agent perceives (Maltoni and Lomonaco, 2016; Cossu et al., 2021). It is very hard to discern what to forget and what concepts to reinforce if all the information is treated equally.

Learning in a compartmentalized fashion hinders many of the possible insights an agent may draw from the complexity of the environment, eventually limiting its possibility to create its world model suitable to the changing task it has to tackle.

Another important side effect of the no repetition constraint is stated in

Issue 2.2. Lack of repetition induces large forgetting effects.

Focusing on catastrophic forgetting would not be inconvenient if real-world problems were actually aligned with the characteristics of the class-incremental scenario (Thai et al., 2021). As expressed by Issue 2.1 however, this is not the case.

Moreover, Issue 2.2 led to the generally accepted statement that replay strategies are the most effective strategies for continual learning (van de Ven et al., 2020). However, as we will show in Section 3, alternative scenarios may greatly reduce the replay advantage, whenever natural replay occurs in the environment. An excessive focus on catastrophic forgetting may also limit the scope of continual learning research, which instead involves many different problems and objectives like optimizing the learning experience for forward transfer and few-shot learning (Lopez-Paz and Ranzato, 2017) or training continuously with limited memory and computational resources (as in edge devices; Díaz-Rodríguez et al., 2018). While there are many works hinting at the fact that continual learning is not only about catastrophic forgetting (Díaz-Rodríguez et al., 2018; Thai et al., 2021), the continual learning scenarios in which most of the research operates is still one in which the forgetting is by far the most pressing problem.

3. Class-Incremental With Repetition Scenarios

The issues raised by the usage of class-incremental scenarios can be easily addressed by relaxing the no repetition constraint. While there are many different ways of relaxing the constraint, the current literature already offers some proposals that can be grouped into the family of class-incremental with repetition (CIR) scenarios.

An extreme example belonging to the family is the New Instances (NI) scenario, also called domain-incremental (Lomonaco and Maltoni, 2017; van de Ven and Tolias, 2018; Maltoni and Lomonaco, 2019). In NI, the first experience brings all the classes that the model will see in the subsequent experiences. Therefore, new experiences only present new instances of previously seen classes. Depending on the amount of variation in the instances, the NI scenario may not present a large amount of forgetting, as in the popular Permuted MNIST benchmark.

Stojanov et al. (2019) introduces the CRIB benchmark for incremental object learning based on exposure from contiguous view. Their incremental learning paradigm considers both the class-incremental scenario and the case of repetition of previous objects with different exposures (e.g., different 3D views). In the experiments, the authors found that allowing for even a small amount of repetition is beneficial to the continual performance, which approaches the one of joint training (offline) as more repetition is provided to the model. The results obtained by Stojanov et al. (2019) were further confirmed also for unsupervised learning objectives, like reconstruction, in Thai et al. (2021). Their scenario with repeated exposures allowed to bridge the gap with the joint training performances without using any explicit replay of previous patterns, but only by leveraging the natural replay occurring in the environment.

ALGORITHM 1

Algorithm 1. Protocol to build class-incremental with repetition benchmark from existing classification dataset.

The work done by Lomonaco et al. (2020) proposes a flexible setup for CIR (see Figure 1 for a depiction of the resulting scenario). The authors based their experiments on the New Instances and Classes (NIC) continual learning scenario together with the CORe50 dataset, both introduced in Lomonaco and Maltoni (2017). The NIC scenario is based on the assumption that each experience will bring patterns coming from both new and previously seen classes. Therefore, it fits well the CIR family of scenarios. Due to the property of the NIC scenario, Lomonaco et al. (2020) were able to experiment on a stream composed of a large number of experiences. The class-incremental counterpart produces a shorter stream since the total number of experiences is limited by the number of classes available in the dataset. The authors also provided the pseudocode of the protocol managing how many new classes will be introduced and in which experience. In contrast, the repeated exposure protocol of Stojanov et al. (2019) and Thai et al. (2021) only uses a random selection to sample which previous objects will be repeated. The design of a flexible procedure to generate CIR benchmarks is an important step since it can greatly contribute to foster the interest around the scenario itself.

FIGURE 1

Figure 1. Comparison between class presence in continual learning streams from the NIC scenario (above) and class-incremental scenario (below). Each row represents a different class, while colors group classes into macro-categories (taken from CORe50 benchmark; Lomonaco and Maltoni, 2017). The horizontal axis represents experiences (training batches). Gray vertical lines in the NIC scenario indicate the introduction of at least a new class in that experience (the newly introduced classes are surrounded by a red square). The NIC protocol shows a longer stream than class-incremental, with a more diverse distribution of the classes.

We now introduce a general protocol for build CIR data streams which, differently from the one by Lomonaco et al. (2020), does not depend on the choice of a specific dataset. Instead, it can be instantiated on any dataset suitable for classification tasks. Algorithm 1 presents the pseudocode for the protocol, which requires to specify the number of experiences and the classes in each experience (patterns are sampled at random from each class). The number of patterns per class in each experience is computed by dividing the available number of patterns for that class by the number of occurrences of the class in the stream of experiences (e.g., if class c₀ has 100 patterns and is present in four experiences, each experience will bring 25 patterns of that class, selected at random without replacement from all the available ones). This is a simple protocol that can be customized in many ways: for example, by providing a custom selection policy for the patterns or by generating the classes in each experience with a properly designed algorithm.

The contributions discussed above show that CIR is already an available and ready-to-use scenario for continual learning experiments.

4. Discussion

Continual learning is, admittedly, one of the grand challenges of artificial intelligence and machine learning. Being able to learn continuously from a progressive exposure to new concepts is fundamental for many real-world applications in which the environment is so diversified and complex that an offline training phase may never provide sufficient knowledge for the agent to succeed.

Continual learning research is still in its infancy. Therefore, it is only natural to rely on simplified experimental configurations and to design solutions that will not adapt to any possible situation. In this sense, we discussed the benefits that class-incremental scenarios brought to the study of continual learning, by providing a quick experimental configuration able to exploit the vast amount of existing resources, especially in the field of computer vision. Given the rapid prototyping opportunities and the large scale size of the datasets and benchmarks involved, class-incremental scenarios will still be useful for the continual learning community.

Nonetheless, it is important to realize that some assumptions behind class-incremental scenarios, the no repetition constraint, in particular, are at the root of the issues we already described in Section 2. We presented a few alternative scenarios (Section 3) that are already present in the literature and that are better aligned with real-world environments in which repetition of previous concepts occurs naturally. Moreover, we provided an algorithm to build custom data streams for CIR scenarios which carefully balances the amount of new and previous knowledge seen by the model. The algorithm is particularly suitable for streams with a large number of small experiences (experiences carrying datasets with few patterns), a configuration present in many real-world applications and that cannot be modeled with a class-incremental scenario. By putting less focus on catastrophic forgetting, CIR scenarios allow to better study alternative continual learning challenges, like exploiting existing representations to learn new information faster, or to identify which portion of knowledge should be kept intact and which portion may be forgotten. More in general, an environment equipped with repetition is an additional source of information that can be exploited by any continual learning agent during its lifetime.

We believe these CIR scenarios are laying the foundations for a more diverse and thorough evaluation of the performance of a continual learning model, which better encompasses the principles and objectives of continual learning.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.

Author Contributions

ACo, ACa, and VL contributed to review the literature around class-incremental and class-incremental with repetition and to write the paper. LP and GG contributed with Figure 1, organization of the pseudocode to build custom class-incremental with repetition scenarios and provided feedbacks and comments on the final paper. DB and DM provided feedbacks on the final work and further suggestions for the discussion. All authors contributed to the article and approved the submitted version.

Funding

This work has been partially supported by the European Community H2020 programme under project TEACHING (Grant No. 871385).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Aljundi, R., Belilovsky, E., Tuytelaars, T., Charlin, L., Caccia, M., Lin, M., et al. (2019a). “Online continual learning with maximal interfered retrieval,” in Advances in Neural Information Processing Systems, Vol. 32, (Curran Associates, Inc.), 11849–11860.

Google Scholar

Aljundi, R., Kelchtermans, K., and Tuytelaars, T. (2019b). “Task-free continual learning,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi: 10.1109/CVPR.2019.01151

CrossRef Full Text | Google Scholar

Cossu, A., Carta, A., Lomonaco, V., and Bacciu, D. (2021). Continual learning for recurrent neural networks: an empirical evaluation. Neural Netw. 143, 607–627. doi: 10.1016/j.neunet.2021.07.021

PubMed Abstract | CrossRef Full Text | Google Scholar

De Lange, M., and Tuytelaars, T. (2020). Continual prototype evolution: learning online from non-stationary data streams. arXiv. doi: 10.1109/ICCV48922.2021.00814

CrossRef Full Text | Google Scholar

De Lange, M., Aljundi, R., Masana, M., Parisot, S., Jia, X., Leonardis, A., et al. (2021). A continual learning survey: defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2021.3057446

PubMed Abstract | CrossRef Full Text | Google Scholar

Díaz-Rodríguez, N., Lomonaco, V., Filliat, D., and Maltoni, D. (2018). Don't forget, there is more than forgetting: new metrics for continual learning. arXiv.

Google Scholar

Lesort, T., Lomonaco, V., Stoian, A., Maltoni, D., Filliat, D., and Díaz-Rodríguez, N. (2020). Continual learning for robotics: definition, framework, learning strategies, opportunities and challenges. Inform. Fus. 58, 52–68. doi: 10.1016/j.inffus.2019.12.004

CrossRef Full Text | Google Scholar

Li, Z., and Hoiem, D. (2016). “Learning without forgetting,” in European Conference on Computer Vision (Springer), 614–629. doi: 10.1007/978-3-319-46493-0_37

CrossRef Full Text | Google Scholar

Lomonaco, V., Pellegrini, L., Cossu, A., Carta, A., Graffieti, G., Hayes, T. L., et al. (2021). “Avalanche: an end-to-end library for continual learning,” in CLVision Workshop at CVPR. doi: 10.1109/CVPRW53098.2021.00399

CrossRef Full Text | Google Scholar

Lomonaco, V., and Maltoni, D. (2017). “CORe50: a new dataset and benchmark for continuous object recognition,” in Proceedings of the 1st Annual Conference on Robot Learning, Vol. 78 of Proceedings of Machine Learning Research, eds S. Levine, V. Vanhoucke, and K. Goldberg, 17–26.

Google Scholar

Lomonaco, V., Maltoni, D., and Pellegrini, L. (2020). “Rehearsal-free continual learning over small non-I.I.D. batches,” in CVPR Workshop on Continual Learning for Computer Vision, 246–247. doi: 10.1109/CVPRW50498.2020.00131

CrossRef Full Text | Google Scholar

Lopez-Paz, D., and Ranzato, M. (2017). “Gradient episodic memory for continual learning,” in NIPS.

PubMed Abstract | Google Scholar

Maltoni, D., and Lomonaco, V. (2019). Continuous learning in single-incremental-task scenarios. Neural Netw. 116, 56–73. doi: 10.1016/j.neunet.2019.03.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Maltoni, D., and Lomonaco, V. (2016). “Semi-supervised tuning from temporal coherence,” in 2016 23rd International Conference on Pattern Recognition (ICPR), 2509–2514. doi: 10.1109/ICPR.2016.7900013

CrossRef Full Text | Google Scholar

McCloskey, M., and Cohen, N. J. (1989). “Catastrophic interference in connectionist networks: the sequential learning problem,” in Psychology of Learning and Motivation, Vol. 24, ed Gordon H. Bower (Elsevier: Academic Press), 109–165. doi: 10.1016/S0079-7421(08)60536-8

CrossRef Full Text | Google Scholar

Parisi, G. I., Kemker, R., Part, J. L., Kanan, C., and Wermter, S. (2019). Continual lifelong learning with neural networks: a review. Neural Netw. 113, 54–71. doi: 10.1016/j.neunet.2019.01.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Rebuffi, S.-A., Kolesnikov, A., Sperl, G., and Lampert, C. H. (2017). “iCaRL: incremental classifier and representation learning,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi: 10.1109/CVPR.2017.587

CrossRef Full Text | Google Scholar

Rusu, A. A., Rabinowitz, N. C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., et al. (2016). Progressive neural networks. arXiv.

Google Scholar

Stojanov, S., Mishra, S., Thai, N. A., Dhanda, N., Humayun, A., Yu, C., et al. (2019). “Incremental object learning from contiguous views,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8777–8786. doi: 10.1109/CVPR.2019.00898

CrossRef Full Text | Google Scholar

Thai, A., Stojanov, S., Rehg, I., and Rehg, J. M. (2021). Does continual learning = catastrophic forgetting? arXiv.

Google Scholar

van de Ven, G. M., Siegelmann, H. T., and Tolias, A. S. (2020). Brain-inspired replay for continual learning with artificial neural networks. Nat. Commun. 11:4069. doi: 10.1038/s41467-020-17866-2

PubMed Abstract | CrossRef Full Text | Google Scholar

van de Ven, G. M., and Tolias, A. S. (2018). “Three scenarios for continual learning,” in Continual Learning Workshop NeurIPS.

Google Scholar

Keywords: continual learning, lifelong learning, catastrophic forgetting, class-incremental, class-incremental with repetition

Citation: Cossu A, Graffieti G, Pellegrini L, Maltoni D, Bacciu D, Carta A and Lomonaco V (2022) Is Class-Incremental Enough for Continual Learning? Front. Artif. Intell. 5:829842. doi: 10.3389/frai.2022.829842

Received: 06 December 2021; Accepted: 25 February 2022;
Published: 24 March 2022.

Edited by:

Tongliang Liu, The University of Sydney, Australia

Reviewed by:

Zheda Mai, University of Toronto, Canada

Copyright © 2022 Cossu, Graffieti, Pellegrini, Maltoni, Bacciu, Carta and Lomonaco. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Andrea Cossu, YW5kcmVhLmNvc3N1QHNucy5pdA==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.