The upcoming release 0.3.0 will introduce an improved base object class BaseSMKObject which is consistent with sklearn.BaseEstimator.
This is a low-level change with direct impact on most components within scikit-multiflow. We have tried to minimize the impact on the end users and this post aims to reduce the uncertainty from this change.
Here are the considerations to have in mind to ensure a smooth transition, depending on your usage of scikit-multiflow.
If you are using scikit-multiflow methods without modifying them then the change should have minimum impact on your workflow. However some methods’ signatures were updated:
DataStream
cat_features_idx -> cat_featuresname: new attribute to define a name for your dataFileStream
cat_features_idx -> cat_featuresPageHinkley
min_num_instances -> min_instancesKNN
categorical_list -> nominal_attributesSimilarly, partial_fit and fit methods where updated for consistency with scikit-learn methods:
weight -> sample_weight
Note: “->” means “renamed to”
In case that you are modifying the scikit-multiflow methods or extending them to create your own methods then there are some considerations to take into account additional to the ones mentioned above.
The skmultiflow.core.BaseSKMObject class is the new base class in scikit-multiflow. It is based on sklearn.BaseEstimator in order to support inter-framework compatibility and adds extra functionality relevant in the context of scikit-multiflow.
Stream models (estimators) in scikit-multiflow are now created by extending the BaseSKMObject class and the corresponding task-specific mixin(s): ClassifierMixin, RegressorMixin, MetaEstimatorMixin, MultiOutputMixin
The ClassifierMixin defines the following methods:
fit – Trains a model in a batch fashion. Works as a an interface to batch methods that implement a fit() functions such as scikit-learn methods.partial_fit – Incrementally trains a stream model.predict – Predicts the target’s value in supervised learning methods.predict_proba – Calculates the probability of a sample pertaining to a given class in classification problems.The RegressorMixin defines the same methods for the regression setting with minor differences in the methods’ signatures.
If you are interested in finding more about this change you can check the corresponding issue and PR in GitHub. Or you can contact us through the user’s group or gitter channel.