scikit-multiflow
is an open-source machine learning package for streaming data. It extends the scientific tools available in the Python ecosystem. scikit-multiflow
is intended for streaming data applications where data is continuously generated and must be processed and analyzed on the go. Data samples are not stored, so learning methods are exposed to new data only once.
The (theoretical) infinite nature of data stream poses additional challenges. While data in unbounded, resources such as memory and time are limited, therefore stream learning methods must be efficient. Additionally, dynamic environments imply that data can change over time. The change in the distribution of data is known as concept drift and can lead to model performance degradation if not handled properly. Drift-aware stream learning methods are especially designed to be robust against this phenomenon.
scikit-multiflow
is part of the stream learning ecosystem. Other tools include MOA
, the most popular open source machine learning framework for data streams, and MEKA
, an open source implementation of methods for multi-label learning. Both MOA
and MEKA
are written in Java.
In Python, scikit-multiflow
complements packages such as scikit-learn
, whose primary focus is batch learning.
Development of scikit-multiflow
is performed by the development team and the open-source community. Current members of the development team (in alphabetical order):
A list including past members and major contributors is available here. We also acknowledge the individual members of the open-source community who have contributed to this project.
If scikit-multiflow
has been useful for your research and you would like to cite it in an academic publication, please use the following paper (bibtex):
Montiel, J., Read, J., Bifet, A., & Abdessalem, T. (2018). Scikit-multiflow: A multi-output streaming framework. The Journal of Machine Learning Research, 19(72):1−5.
The scikit-multiflow
logo is based on the design by Vectortwins / Freepik. The font used is Flux Regular.