skmultiflow.data.data_stream

Classes

DataStream(data[, y, target_idx, n_targets, …])

A stream generated from the entries of a dataset ( numpy array or pandas DataFrame).

class skmultiflow.data.data_stream.DataStream(data, y=None, target_idx=-1, n_targets=1, cat_features_idx=None)[source][source]

A stream generated from the entries of a dataset ( numpy array or pandas DataFrame).

The stream is able to provide, as requested, a number of samples, in a way that old samples cannot be accessed in a later time. This is done so that a stream context can be correctly simulated.

DataStream takes the whole data set are separates the X and Y or takes X and Y separately. For the first case target_idx and n_targets need to be provided, in the next case they are not needed.

Parameters
  • data (np.ndarray or pd.DataFrame (Default=None)) – The features’ columns and targets’ columns or the feature columns only if they are passed separately.

  • y (np.ndarray or pd.DataFrame, optional (Default=None)) – The targets’ columns.

  • target_idx (int, optional (default=-1)) – The column index from which the targets start.

  • n_targets (int, optional (default=1)) – The number of targets.

  • cat_features_idx (list, optional (default=None)) – A list of indices corresponding to the location of categorical features.

X

Return the features’ columns.

Returns

the features’ columns

Return type

np.ndarray

cat_features_idx

Get the list of the categorical features index.

Returns

List of categorical features index.

Return type

list

data

Return the data set used to generate the stream.

Returns

Data set.

Return type

pd.DataFrame

feature_names

Retrieve the names of the features.

Returns

names of the features

Return type

list

get_class_type()[source]

The class type is a string that identifies the type of object generated by that module.

Returns

Return type

The class type

get_data_info()[source][source]

get_name

Gets the name of the plot, which is a string that will appear in evaluation methods, to represent the stream.

The default format is: ‘Stream name - x labels’.

Returns

A string representing the plot name.

Return type

string

get_info()[source][source]

A sum-up of all important characteristics of a class.

The default format of the return string is as follows: ClassName: attribute_one: value_one - attribute_two: value_two - info_one: info_one_value

Returns

  • string

  • A string with the class’ relevant information.

has_more_samples()[source][source]

Checks if stream has more samples.

Returns

True if stream has more samples.

Return type

Boolean

is_restartable()[source]

Determine if the stream is restartable. :returns: True if stream is restartable. :rtype: Boolean

last_sample()[source]

Retrieves last batch_size samples in the stream.

Returns

A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

Return type

tuple or tuple list

n_cat_features

Retrieve the number of integer features.

Returns

The number of integer features in the stream.

Return type

int

n_features

Retrieve the number of features.

Returns

The total number of features.

Return type

int

n_num_features

Retrieve the number of numerical features.

Returns

The number of numerical features in the stream.

Return type

int

n_remaining_samples()[source][source]

Returns the estimated number of remaining samples.

Returns

Remaining number of samples.

Return type

int

n_targets

Get the number of targets.

Returns

The number of targets.

Return type

int

next_sample(batch_size=1)[source][source]

If there is enough instances to supply at least batch_size samples, those are returned. If there aren’t a tuple of (None, None) is returned.

Parameters

batch_size (int) – The number of instances to return.

Returns

Returns the next batch_size instances. For general purposes the return can be treated as a numpy.ndarray.

Return type

tuple or tuple list

prepare_for_use()[source][source]

Prepares the stream for use. This functions should always be called after the stream initialization.

print_df()[source][source]

Prints all the samples in the stream.

random_state

Retrieve the random state of the stream.

Returns

Return type

RandomState

restart()[source][source]

Restarts the stream’s sample feeding, while keeping all of its parameters.

It basically server the purpose of reinitializing the stream to its initial state.

target_idx

Get the number of the column where Y begins.

Returns

The number of the column where Y begins.

Return type

int

target_names

Retrieve the names of the targets

Returns

the names of the targets in the stream.

Return type

list

target_values

Retrieve all target_values in the stream for each target.

Returns

list of lists of all target_values for each target

Return type

list

y

Return the targets’ columns.

Returns

the targets’ columns

Return type

np.ndarray