skmultiflow.data.FileStream

class skmultiflow.data.FileStream(filepath, target_idx=-1, n_targets=1, cat_features=None)[source]

Creates a stream from a file source.

For the moment only csv files are supported, but the goal is to support different formats, as long as there is a function that correctly reads, interprets, and returns a pandas’ DataFrame or numpy.ndarray with the data.

Parameters
  • filepath – Path to the data file

  • target_idx (int, optional (default=-1)) – The column index from which the targets start.

  • n_targets (int, optional (default=1)) – The number of targets.

  • cat_features (list, optional (default=None)) – A list of indices corresponding to the location of categorical features.

Notes

The stream object provides upon request a number of samples, in a way such that old samples cannot be accessed at a later time. This is done to correctly simulate the stream context.

Examples

>>> # Imports
>>> from skmultiflow.data.file_stream import FileStream
>>> # Setup the stream
>>> stream = FileStream('skmultiflow/data/datasets/sea_stream.csv')
>>> stream.prepare_for_use()
>>> # Retrieving one sample
>>> stream.next_sample()
(array([[0.080429, 8.397187, 7.074928]]), array([0]))
>>> # Retrieving 10 samples
>>> stream.next_sample(10)
(array([[1.42074 , 7.504724, 6.764101],
    [0.960543, 5.168416, 8.298959],
    [3.367279, 6.797711, 4.857875],
    [9.265933, 8.548432, 2.460325],
    [7.295862, 2.373183, 3.427656],
    [9.289001, 3.280215, 3.154171],
    [0.279599, 7.340643, 3.729721],
    [4.387696, 1.97443 , 6.447183],
    [2.933823, 7.150514, 2.566901],
    [4.303049, 1.471813, 9.078151]]),
    array([0, 0, 1, 1, 1, 1, 0, 0, 1, 0]))
>>> stream.n_remaining_samples()
39989
>>> stream.has_more_samples()
True
__init__(filepath, target_idx=-1, n_targets=1, cat_features=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(filepath[, target_idx, n_targets, …])

Initialize self.

get_all_samples()

returns all the samples in the stream.

get_data_info()

Retrieves minimum information from the stream

get_info()

Collects and returns the information about the configuration of the estimator

get_params([deep])

Get parameters for this estimator.

get_target_values()

has_more_samples()

Checks if stream has more samples.

is_restartable()

Determine if the stream is restartable.

last_sample()

Retrieves last batch_size samples in the stream.

n_remaining_samples()

Returns the estimated number of remaining samples.

next_sample([batch_size])

If there is enough instances to supply at least batch_size samples, those are returned.

prepare_for_use()

Prepares the stream for use.

reset()

Resets the estimator to its initial state.

restart()

Restarts the stream’s sample feeding, while keeping all of its parameters.

set_params(**params)

Set the parameters of this estimator.

Attributes

cat_features_idx

Get the list of the categorical features index.

feature_names

Retrieve the names of the features.

n_cat_features

Retrieve the number of integer features.

n_features

Retrieve the number of features.

n_num_features

Retrieve the number of numerical features.

n_targets

Get the number of targets.

target_idx

Get the number of the column where Y begins.

target_names

Retrieve the names of the targets

target_values

Retrieve all target_values in the stream for each target.

property cat_features_idx

Get the list of the categorical features index.

Returns

List of categorical features index.

Return type

list

property feature_names

Retrieve the names of the features.

Returns

names of the features

Return type

list

get_all_samples()[source]

returns all the samples in the stream.

Returns

  • X (pd.DataFrame) – The features’ columns.

  • y (pd.DataFrame) – The targets’ columns.

get_data_info()[source]

Retrieves minimum information from the stream

Used by evaluator methods to id the stream.

The default format is: ‘Stream name - n_targets, n_classes, n_features’.

Returns

Stream data information

Return type

string

get_info()[source]

Collects and returns the information about the configuration of the estimator

Returns

Configuration of the estimator.

Return type

string

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters

deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns

params – Parameter names mapped to their values.

Return type

mapping of string to any

has_more_samples()[source]

Checks if stream has more samples.

Returns

True if stream has more samples.

Return type

Boolean

is_restartable()[source]

Determine if the stream is restartable. :returns: True if stream is restartable. :rtype: Boolean

last_sample()[source]

Retrieves last batch_size samples in the stream.

Returns

A numpy.ndarray of shape (batch_size, n_features) and an array-like of shape (batch_size, n_targets), representing the next batch_size samples.

Return type

tuple or tuple list

property n_cat_features

Retrieve the number of integer features.

Returns

The number of integer features in the stream.

Return type

int

property n_features

Retrieve the number of features.

Returns

The total number of features.

Return type

int

property n_num_features

Retrieve the number of numerical features.

Returns

The number of numerical features in the stream.

Return type

int

n_remaining_samples()[source]

Returns the estimated number of remaining samples.

Returns

Remaining number of samples.

Return type

int

property n_targets

Get the number of targets.

Returns

The number of targets.

Return type

int

next_sample(batch_size=1)[source]

If there is enough instances to supply at least batch_size samples, those are returned. If there aren’t a tuple of (None, None) is returned.

Parameters

batch_size (int) – The number of instances to return.

Returns

Returns the next batch_size instances. For general purposes the return can be treated as a numpy.ndarray.

Return type

tuple or tuple list

prepare_for_use()[source]

Prepares the stream for use.

Notes

This functions should always be called after the stream initialization.

reset()[source]

Resets the estimator to its initial state.

Returns

Return type

self

restart()[source]

Restarts the stream’s sample feeding, while keeping all of its parameters.

It basically server the purpose of reinitializing the stream to its initial state.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns

Return type

self

property target_idx

Get the number of the column where Y begins.

Returns

The number of the column where Y begins.

Return type

int

property target_names

Retrieve the names of the targets

Returns

the names of the targets in the stream.

Return type

list

property target_values

Retrieve all target_values in the stream for each target.

Returns

list of lists of all target_values for each target

Return type

list