The ravages of concept drift in stream learning applications and how to deal with it
Stream data processing has gained progressive momentum with the arriving of new stream applications and big data scenarios. These streams of data evolve generally over time and may be occasionally affected by a change (concept drift). How to handle this change by using detection and adaptation mechanisms is crucial in many real-world systems.
By Jesús López, Data Scientist and Researcher in Machine Learning and Artificial Intelligence
Introduction
The Big Data paradigm has gained momentum last decade, because of its promise to deliver valuable insights to many real-world applications. With the advent of this emerging paradigm comes not only an increase in the volume of available data, but also the notion of its arrival velocity, that is, these real-world applications generate data in real-time at rates faster than those that can be handled by traditional systems. This situation leads us to assume that we have to deal with a potentially infinite and ever-growing datasets that may arrive continuously (stream learning) in batches of instances or instance by instance, in contrast to traditional systems where there is free access to all historical data. These traditional processing systems assume that data are at rest and simultaneously accessed. The models based on this traditional processing do not continuously integrate new information into already constructed models but, instead, regularly reconstruct new models from the scratch. However, the incremental learning [1] that is carried out by stream learning methods presents advantages for this particular processing by continuously incorporating information into its models, and traditionally aim at minimal processing time and space. Stream learning also presents many new challenges [2] and poses stringent conditions: only a single instance (or a small batch of instances) is provided to the learning algorithm at every time instant, a very limited processing time, a finite amount of memory, and the necessity of having trained models at every scan of the streams of data. In addition, these streams of data may evolve over time and may be occasionally affected by a change (which can be abrupt, gradual, etc. as in Figure 1) in their data distribution (concept drift), forcing models to learn/adapt under non-stationary conditions.
We can find many examples of real-world stream learning applications [3], such as mobile phones, industrial process controls, intelligent user interfaces, intrusion detection, spam detection, fraud detection, loan recommendation, monitoring and traffic management, among others. In this context, the Internet of Things (IoT) has become one of the main applications of stream learning, since it is producing huge quantity of data continuously in real-time. The IoT is defined as sensors and actuators connected by networks to computing systems, which monitors and manages the health and actions of connected objects or machines in real-time. Therefore, stream data analysis is becoming a standard to extract useful knowledge from what is happening at each moment, allowing people or organizations to react quickly when inconveniences emerge or when new trends appear, helping them increase their performance.
The impact of concept drift on data streams
These predictive models need to be adapted to these changes (drifts) as fast as possible while maintaining good performance scores (i.e. accuracy), obtaining the maximum performance score and using minimum time and low memory at the same time. Otherwise, these predictive models trained over these data will become obsolete and will not not adapt suitably to the new distribution.
In order to understand the impact of concept drift on these evolving data streams, let’s have a look at two examples. Figure 2 clearly shows such a case. The stream learning scenario starts at t=0, and during the first 100 instances, both models are pre-trained. From then on, they start to learn incrementally by using one instance at a time. During the stable phase (first concept, in the absence of drift), both models perform equally. But after drift occurs at t=1000 (abrupt drift) a new concept emerges, and the model without detection and adaptation mechanisms (solid line) starts to worsen its predictive performance (i.e. prequential accuracy). However, the model with detection and adaptation mechanisms (dotted line) has forgotten the old concept and has learnt the new one quickly (adaptation), by providing a competitive predictive performance. Finally, Figure 3 shows a binary classification problem in which one of the models (upper) does not adapt to the new distribution after drift occurs, and thus is not able to correctly classify the incoming instances. Nevertheless, the other one (lower) adapts to the new situation and then correctly classify the incoming instances. As it has been previously mentioned, sometimes the adaptation mechanism is carried out in a passive manner, but models frequently need a drift detector to know the best moment to trigger their adaptation mechanism (active manner).
Therefore, the drift detection and adaptation mechanisms are key ingredients in stream learning environments under evolving conditions.
Handling concept drift with detection and/or adaptation
As previously mentioned, in stream learning we cannot explicitly store all past data to detect or quantify the change, so concept drift detection and adaptation become big challenges for real-time algorithms [4]. And two change management strategies are usually grouped to deal with concept drift [5]: passive (which updates the model continuously every time new data instances are received) and active (only the model gets updated when a drift is detected). Both can be successful in practice, however, the reason for choosing one strategy over the other is typically specific to the application. In general, a passive strategy has shown to be quite effective in prediction settings with gradual drifts and recurring concepts, while an active strategy works quite well in settings where the drift is abrupt. Besides, a passive strategy is generally better suited for batch learning whereas an active strategy has been shown to work well in online settings as well.
Drift detection mechanisms quantitatively characterize concept drift events by identifying change points or small-sized periods of time (windows) during which these changes may occur. Drift detectors may return not only signals about drift occurrence, but also warning signals, which are usually conceived as the moment when a change is suspected and a new training set representing the new concept should start being collected. Drift detection is not atrivial task because, on the one hand, sufficiently fast drift detection should be ensured to quickly replace the outdated model and to reduce the restoration time (transient to the stable period). On the other hand, it is not convenient to have too many false alarms (there is no real drift in the stream), because the successive application of drift handling techniques could be counterproductive. In the Figure 4 we see an example of a drift detection mechanism.
Frameworks to deal with incremental learning and concept drift
The references provided here contain software implementations for streaming algorithms that can work on stationary and non-stationary scenarios. We do not claim this list to be exhaustive, but provides several opportunities for novices to get started, and established researchers to expand their contributions.
- MOA1 is probably the most popular open source Java framework for data stream mining. It is focused on incremental learning, and also on drift adaptation or detection.
- Scikit-Multiflow2 is implemented in Python given its increasing popularity in the Machine Learning community, and it is inspired by MOA. It contains a collection of Machine Learning algorithms, datasets, tools, and metrics for stream learning evaluation. It is focused on incremental learning, and also on drift adaptation or detection.
- Although scikit-learn3 is mainly focused on batch learning, this framework also provides researchers with some stream learning methods, such as Multinomial Naive Bayes, Perceptron, a Stochastic Gradient Descent classifier, a Passive Aggressive classifier, among others. It is implemented in Python. It is mainly focused on incremental learning, and not on drift adaptation or detection.
- Spark Streaming4 is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. It is mainly focused on incremental learning, and not on drift adaptation or detection.
- Creme5 is a Python library mainly focused on incremental learning, and not on drift adaptation or detection.
FOOTNOTES
- https://moa.cms.waikato.ac.nz/
- https://scikit-multiflow.github.io/
- https://scikit-learn.org/stable/
- https://spark.apache.org/docs/latest/streaming-programming-guide.html
- https://github.com/creme-ml/creme
REFERENCES
[1] Losing, V., Hammer, B., & Wersing, H. (2018). Incremental on-line learning: A review and comparison of state of the art algorithms. Neurocomputing, 275, 1261–1274.
[2] Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, .
[3] Žliobaitė, I., Pechenizkiy, M., & Gama, J. (2016). An overview of concept drift applications. In Big Data Analysis: New Algorithms for a New Society (pp. 91–114). Springer.
[4] Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2018). Learning under concept drift: A review. IEEE Transactions on Knowledge and Data Engineering, .
[5] Ditzler, G., Roveri, M., Alippi, C., & Polikar, R. (2015). Learning in nonstationary environments: A survey. IEEE Computational Intelligence Magazine, 10, 12–25.
Bio: Dr. Jesús López (@txuslopez) is currently based in Bilbao (Spain) working at TECNALIA as Data Scientist and Researcher in Machine Learning and Artificial Intelligence. His research topics are real-time data mining (stream learning), concept drift, continual learning, anomaly detection, spiking neural networks, and cellular automata for data mining. Not forgetting the leisure side, he also loves to go outdoors to surf all over the globe.
Related:
- Event Processing: Three Important Open Problems
- How to Monitor Machine Learning Models in Real-Time
- Probability Learning: Maximum Likelihood