skmultiflow.core.Pipeline

class skmultiflow.core.Pipeline(steps)[source]

[Experimental] Holds a set of sequential operation (transforms), followed by a single estimator.

It allows for easy manipulation of datasets that may require several transformation processes before being used by a learner. Also allows for the cross-validation of several steps.

Each of the intermediate steps should be an extension of the BaseTransform class, or at least implement the transform and partial_fit functions or the partial_fit_transform.

The last step should be an estimator (learner), so it should implement partial_fit, and predict at least.

Since it has an estimator as the last step, the Pipeline will act like an estimator itself, in a way that it can be directly passed to evaluation objects, as if it was a learner.

Parameters

steps (list of tuple) – Tuple list containing the set of transforms and the final estimator. It doesn’t need to contain a transform type object, but the estimator is required. Each tuple should be of the format (‘name’, estimator).

Raises
  • TypeError – If the intermediate steps or the final estimator do not implement:

  • the necessary functions for the pipeline to work, a TypeError is raised.

  • NotImplementedError – Some of the functions are yet to be implemented.:

Notes

This code is an experimental feature. Use with caution.

Examples

>>> # Imports
>>> from skmultiflow.lazy import KNNAdwin
>>> from skmultiflow.core import Pipeline
>>> from skmultiflow.data import FileStream
>>> from skmultiflow.evaluation import EvaluatePrequential
>>> from skmultiflow.transform import OneHotToCategorical
>>> # Setting up the stream
>>> stream = FileStream("skmultiflow/data/datasets/covtype.csv")
>>> stream.prepare_for_use()
>>> transform = OneHotToCategorical([[10, 11, 12, 13],
... [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
... 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]])
>>> # Setting up the classifier
>>> classifier = KNNAdwin(n_neighbors=8, max_window_size=2000, leaf_size=40)
>>> # Setup the pipeline
>>> pipe = Pipeline([('transform', transform), ('passive_aggressive', classifier)])
>>> # Setup the evaluator
>>> evaluator = EvaluatePrequential(show_plot=True, pretrain_size=1000, max_samples=500000)
>>> # Evaluate
>>> evaluator.evaluate(stream=stream, model=pipe)
__init__(steps)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(steps)

Initialize self.

fit(X, y)

Sequentially fit and transform data in all but last step, then fit the model in last step.

get_info()

Collects and returns the information about the configuration of the estimator

get_params([deep])

Get parameters for this estimator.

named_steps()

Generates a dictionary to access all the steps’ properties.

partial_fit(X, y[, classes])

Sequentially partial fit and transform data in all but last step, then partial fit data in last step.

partial_fit_predict(X, y)

Partial fits and transforms data in all but last step, then partial fits and predicts in the last step

partial_fit_transform(X[, y])

Partial fits and transforms data in all but last step, then partial_fit in last step

predict(X)

Sequentially applies all transforms and then predict with last step.

reset()

Resets the estimator to its initial state.

set_params(**params)

Set the parameters of this estimator.

fit(X, y)[source]

Sequentially fit and transform data in all but last step, then fit the model in last step.

Parameters
  • X (numpy.ndarray of shape (n_samples, n_features)) – The data upon which the transforms/estimator will create their model.

  • y (An array_like object of length n_samples) – Contains the true class labels for all the samples in X.

Returns

self

Return type

Pipeline

get_info()[source]

Collects and returns the information about the configuration of the estimator

Returns

Configuration of the estimator.

Return type

string

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

named_steps()[source]

Generates a dictionary to access all the steps’ properties.

Returns

A steps dictionary, so that each step can be accessed by name.

Return type

dictionary

partial_fit(X, y, classes=None)[source]

Sequentially partial fit and transform data in all but last step, then partial fit data in last step.

Parameters
  • X (numpy.ndarray of shape (n_samples, n_features)) – The features to train the model.

  • y (numpy.ndarray of shape (n_samples)) – An array-like with the class labels of all samples in X.

  • classes (numpy.ndarray) – Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.

Returns

self

Return type

Pipeline

partial_fit_predict(X, y)[source]

Partial fits and transforms data in all but last step, then partial fits and predicts in the last step

Parameters
  • X (numpy.ndarray of shape (n_samples, n_features)) – All the samples we want to predict the label for.

  • y (An array_like object of length n_samples) – Contains the true class labels for all the samples in X

Returns

The predicted class label for all the samples in X.

Return type

list

partial_fit_transform(X, y=None)[source]

Partial fits and transforms data in all but last step, then partial_fit in last step

Parameters
  • X (numpy.ndarray of shape (n_samples, n_features)) – The data upon which the transforms/estimator will create their model.

  • y (An array_like object of length n_samples) – Contains the true class labels for all the samples in X

Returns

self

Return type

Pipeline

predict(X)[source]

Sequentially applies all transforms and then predict with last step.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – All the samples we want to predict the label for.

Returns

The predicted class label for all the samples in X.

Return type

list

reset()[source]

Resets the estimator to its initial state.

Returns

Return type

self

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self