scikit-multiflow is an open-source machine learning package for streaming data. It extends the scientific tools available in the Python ecosystem.
scikit-multiflow is intended for streaming data applications where data is continuously generated and must be processed and analyzed on the go. Data samples are not stored, so learning methods are exposed to new data only once.
The (theoretical) infinite nature of data stream poses additional challenges. While data in unbounded, resources such as memory and time are limited, therefore stream learning methods must be efficient. Additionally, dynamic environments imply that data can change over time. The change in the distribution of data is known as concept drift and can lead to model performance degradation if not handled properly. Drift-aware stream learning methods are especially designed to be robust against this phenomenon.
scikit-multiflow is part of the stream learning ecosystem. Other tools include
MOA, the most popular open source machine learning framework for data streams, and
MEKA, an open source implementation of methods for multi-label learning. Both
MEKA are written in Java.
scikit-multiflow complements packages such as
scikit-learn, whose primary focus is batch learning.
scikit-multiflow is performed by the development team and the open-source community. Current members of the development team (in alphabetical order):
Montiel, J., Read, J., Bifet, A., & Abdessalem, T. (2018). Scikit-multiflow: A multi-output streaming framework. The Journal of Machine Learning Research, 19(72):1−5.
scikit-multiflow logo is based on the design by Vectortwins / Freepik. The font used is Flux Regular.