# skmultiflow.data.DataStream¶

class skmultiflow.data.DataStream(data, y=None, target_idx=-1, n_targets=1, cat_features=None, name=None)[source]

Creates a stream from a data source.

DataStream takes the whole data set containing the X (features) and Y (targets) or takes X and Y separately. For the first case target_idx and n_targets need to be provided, in the second case they are not needed.

Parameters
• data (np.ndarray or pd.DataFrame (Default=None)) – The features’ columns and targets’ columns or the feature columns only if they are passed separately.

• y (np.ndarray or pd.DataFrame, optional (Default=None)) – The targets’ columns.

• target_idx (int, optional (default=-1)) – The column index from which the targets start.

• n_targets (int, optional (default=1)) – The number of targets.

• cat_features (list, optional (default=None)) – A list of indices corresponding to the location of categorical features.

• name (str, optional (default=None)) – A string to id the data.

Notes

The stream object provides upon request a number of samples, in a way such that old samples cannot be accessed at a later time. This is done to correctly simulate the stream context.

__init__(data, y=None, target_idx=-1, n_targets=1, cat_features=None, name=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

 __init__(data[, y, target_idx, n_targets, …]) Initialize self. Retrieves minimum information from the stream Collects and returns the information about the configuration of the estimator get_params([deep]) Get parameters for this estimator. Checks if stream has more samples. Determine if the stream is restartable. Retrieves last batch_size samples in the stream. Returns the estimated number of remaining samples. next_sample([batch_size]) If there is enough instances to supply at least batch_size samples, those are returned. Prepares the stream for use. Prints all the samples in the stream. Resets the estimator to its initial state. Restarts the stream’s sample feeding, while keeping all of its parameters. set_params(**params) Set the parameters of this estimator.

Attributes

 X Return the features’ columns. cat_features_idx Get the list of the categorical features index. data Return the data set used to generate the stream. feature_names Retrieve the names of the features. n_cat_features Retrieve the number of integer features. n_features Retrieve the number of features. n_num_features Retrieve the number of numerical features. n_targets Get the number of targets. target_idx Get the number of the column where Y begins. target_names Retrieve the names of the targets target_values Retrieve all target_values in the stream for each target. y Return the targets’ columns.
X

Return the features’ columns.

Returns

the features’ columns

Return type

np.ndarray

cat_features_idx

Get the list of the categorical features index.

Returns

List of categorical features index.

Return type

list

data

Return the data set used to generate the stream.

Returns

Data set.

Return type

pd.DataFrame

feature_names

Retrieve the names of the features.

Returns

names of the features

Return type

list

get_data_info()[source]

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

Stream data information

Return type

string

get_info()[source]

Collects and returns the information about the configuration of the estimator

Returns

Configuration of the estimator.

Return type

string

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

has_more_samples()[source]

Checks if stream has more samples.

Returns

True if stream has more samples.

Return type

Boolean

is_restartable()[source]

Determine if the stream is restartable. :returns: True if stream is restartable. :rtype: Boolean

last_sample()[source]

Retrieves last batch_size samples in the stream.

Returns

A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

Return type

tuple or tuple list

n_cat_features

Retrieve the number of integer features.

Returns

The number of integer features in the stream.

Return type

int

n_features

Retrieve the number of features.

Returns

The total number of features.

Return type

int

n_num_features

Retrieve the number of numerical features.

Returns

The number of numerical features in the stream.

Return type

int

n_remaining_samples()[source]

Returns the estimated number of remaining samples.

Returns

Remaining number of samples.

Return type

int

n_targets

Get the number of targets.

Returns

The number of targets.

Return type

int

next_sample(batch_size=1)[source]

If there is enough instances to supply at least batch_size samples, those are returned. If there aren’t a tuple of (None, None) is returned.

Parameters

batch_size (int) – The number of instances to return.

Returns

Returns the next batch_size instances. For general purposes the return can be treated as a numpy.ndarray.

Return type

tuple or tuple list

prepare_for_use()[source]

Prepares the stream for use.

Notes

This functions should always be called after the stream initialization.

print_df()[source]

Prints all the samples in the stream.

reset()[source]

Resets the estimator to its initial state.

Returns

Return type

self

restart()[source]

Restarts the stream’s sample feeding, while keeping all of its parameters.

It basically server the purpose of reinitializing the stream to its initial state.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self

target_idx

Get the number of the column where Y begins.

Returns

The number of the column where Y begins.

Return type

int

target_names

Retrieve the names of the targets

Returns

the names of the targets in the stream.

Return type

list

target_values

Retrieve all target_values in the stream for each target.

Returns

list of lists of all target_values for each target

Return type

list

y

Return the targets’ columns.

Returns

the targets’ columns

Return type

np.ndarray