# skmultiflow.core.Pipeline¶

class skmultiflow.core.Pipeline(steps)[source]

[Experimental] Holds a set of sequential operation (transforms), followed by a single estimator.

It allows for easy manipulation of datasets that may require several transformation processes before being used by a learner. Also allows for the cross-validation of several steps.

Each of the intermediate steps should be an extension of the BaseTransform class, or at least implement the transform and partial_fit functions or the partial_fit_transform.

The last step should be an estimator (learner), so it should implement partial_fit, and predict at least.

Since it has an estimator as the last step, the Pipeline will act like an estimator itself, in a way that it can be directly passed to evaluation objects, as if it was a learner.

Parameters

steps (list of tuple) – Tuple list containing the set of transforms and the final estimator. It doesn’t need to contain a transform type object, but the estimator is required. Each tuple should be of the format (‘name’, estimator).

Raises
• TypeError – If the intermediate steps or the final estimator do not implement:

• the necessary functions for the pipeline to work, a TypeError is raised.

• NotImplementedError – Some of the functions are yet to be implemented.:

Notes

This code is an experimental feature. Use with caution.

Examples

>>> # Imports
>>> from skmultiflow.core import Pipeline
>>> from skmultiflow.data import FileStream
>>> from skmultiflow.evaluation import EvaluatePrequential
>>> from skmultiflow.transform import OneHotToCategorical
>>> # Setting up the stream
>>> stream = FileStream("skmultiflow/data/datasets/covtype.csv")
>>> stream.prepare_for_use()
>>> transform = OneHotToCategorical([[10, 11, 12, 13],
... [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
... 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53]])
>>> # Setting up the classifier
>>> classifier = KNNAdwin(n_neighbors=8, max_window_size=2000, leaf_size=40)
>>> # Setup the pipeline
>>> pipe = Pipeline([('transform', transform), ('passive_aggressive', classifier)])
>>> # Setup the evaluator
>>> evaluator = EvaluatePrequential(show_plot=True, pretrain_size=1000, max_samples=500000)
>>> # Evaluate
>>> evaluator.evaluate(stream=stream, model=pipe)

__init__(steps)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

 __init__(steps) Initialize self. fit(X, y) Sequentially fit and transform data in all but last step, then fit the model in last step. Collects and returns the information about the configuration of the estimator get_params([deep]) Get parameters for this estimator. Generates a dictionary to access all the steps’ properties. partial_fit(X, y[, classes]) Sequentially partial fit and transform data in all but last step, then partial fit data in last step. Partial fits and transforms data in all but last step, then partial fits and predicts in the last step partial_fit_transform(X[, y]) Partial fits and transforms data in all but last step, then partial_fit in last step Sequentially applies all transforms and then predict with last step. Resets the estimator to its initial state. set_params(**params) Set the parameters of this estimator.
fit(X, y)[source]

Sequentially fit and transform data in all but last step, then fit the model in last step.

Parameters
• X (numpy.ndarray of shape (n_samples, n_features)) – The data upon which the transforms/estimator will create their model.

• y (An array_like object of length n_samples) – Contains the true class labels for all the samples in X.

Returns

self

Return type

Pipeline

get_info()[source]

Collects and returns the information about the configuration of the estimator

Returns

Configuration of the estimator.

Return type

string

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

named_steps()[source]

Generates a dictionary to access all the steps’ properties.

Returns

A steps dictionary, so that each step can be accessed by name.

Return type

dictionary

partial_fit(X, y, classes=None)[source]

Sequentially partial fit and transform data in all but last step, then partial fit data in last step.

Parameters
• X (numpy.ndarray of shape (n_samples, n_features)) – The features to train the model.

• y (numpy.ndarray of shape (n_samples)) – An array-like with the class labels of all samples in X.

• classes (numpy.ndarray) – Array with all possible/known class labels. This is an optional parameter, except for the first partial_fit call where it is compulsory.

Returns

self

Return type

Pipeline

partial_fit_predict(X, y)[source]

Partial fits and transforms data in all but last step, then partial fits and predicts in the last step

Parameters
• X (numpy.ndarray of shape (n_samples, n_features)) – All the samples we want to predict the label for.

• y (An array_like object of length n_samples) – Contains the true class labels for all the samples in X

Returns

The predicted class label for all the samples in X.

Return type

list

partial_fit_transform(X, y=None)[source]

Partial fits and transforms data in all but last step, then partial_fit in last step

Parameters
• X (numpy.ndarray of shape (n_samples, n_features)) – The data upon which the transforms/estimator will create their model.

• y (An array_like object of length n_samples) – Contains the true class labels for all the samples in X

Returns

self

Return type

Pipeline

predict(X)[source]

Sequentially applies all transforms and then predict with last step.

Parameters

X (numpy.ndarray of shape (n_samples, n_features)) – All the samples we want to predict the label for.

Returns

The predicted class label for all the samples in X.

Return type

list

reset()[source]

Resets the estimator to its initial state.

Returns

Return type

self

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self