The upcoming release 0.3.0
will introduce an improved base object class BaseSMKObject
which is consistent with sklearn.BaseEstimator
.
This is a low-level change with direct impact on most components within scikit-multiflow
. We have tried to minimize the impact on the end users and this post aims to reduce the uncertainty from this change.
Here are the considerations to have in mind to ensure a smooth transition, depending on your usage of scikit-multiflow
.
If you are using scikit-multiflow
methods without modifying them then the change should have minimum impact on your workflow. However some methods’ signatures were updated:
DataStream
cat_features_idx
-> cat_features
name
: new attribute to define a name for your dataFileStream
cat_features_idx
-> cat_features
PageHinkley
min_num_instances
-> min_instances
KNN
categorical_list
-> nominal_attributes
Similarly, partial_fit
and fit
methods where updated for consistency with scikit-learn
methods:
weight
-> sample_weight
Note: “->” means “renamed to”
In case that you are modifying the scikit-multiflow
methods or extending them to create your own methods then there are some considerations to take into account additional to the ones mentioned above.
The skmultiflow.core.BaseSKMObject
class is the new base class in scikit-multiflow
. It is based on sklearn.BaseEstimator
in order to support inter-framework compatibility and adds extra functionality relevant in the context of scikit-multiflow
.
Stream models (estimators) in scikit-multiflow
are now created by extending the BaseSKMObject
class and the corresponding task-specific mixin(s): ClassifierMixin
, RegressorMixin
, MetaEstimatorMixin
, MultiOutputMixin
The ClassifierMixin
defines the following methods:
fit
– Trains a model in a batch fashion. Works as a an interface to batch methods that implement a fit()
functions such as scikit-learn
methods.partial_fit
– Incrementally trains a stream model.predict
– Predicts the target’s value in supervised learning methods.predict_proba
– Calculates the probability of a sample pertaining to a given class in classification problems.The RegressorMixin
defines the same methods for the regression setting with minor differences in the methods’ signatures.
If you are interested in finding more about this change you can check the corresponding issue and PR in GitHub. Or you can contact us through the user’s group or gitter channel.