# Using Streams in scikit-multiflow¶

## Stream generators¶

Stream generators are a cheap source of data, since data samples are generated on demand we can avoid storing data physically. There are multiple stream generators in scikit-multiflow and all of them work in a similar way.

Here, we will use the AGRAWALGenerator to exemplify how to use generators within scikit-multiflow

1. Instantiate the Stream generator

generator = AGRAWALGenerator()
generator.prepare_for_use()


The call to prepare_for_use() ensures that the Stream object is ready and must be done before using a Stream object.

2. Get data from the stream

Use next_sample() to obtain data (samples) from any Stream object. The Stream will return n_samples using two arrays: X for features and y for classes (classification) or targets (regression).

X, y = generator.next_sample()
print(X.shape, y.shape)
>>> (1, 9) (1,)


By default, next_sample() returns one sample, but we can pass an arbitrary number of samples as next_sample(n_samples). For example, to get 1000 samples:

X, y = generator.next_sample(1000)
print(X.shape, y.shape)
>>> (1000, 9) (1000,)

1. Check if the stream has more data

When working with streams, it is important to know if there is more data remaining. You can use has_more_samples() to query the Stream for this information.

generator.has_more_samples()
>>> True

1. Restart the stream

To restart a Stream object to its initial state, we can use restart()

generator.restart()


5: Save the data into a csv file [Optional]

There might be cases where we want to store the information obtained from a Stream generator. An easy way to do it is using numpy and pandas. First, we concatenate the X and y arrays into a single np.array. Then we create a DataFrame that is easy manipulate, for example if we want to name the features, pre-process the data, etc.

df = pd.DataFrame(np.hstack((X,np.array([y]).T)))


Finally, to write the data into a csv:

df.to_csv("file.csv")


Putting it all together:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 from skmultiflow.data import AGRAWALGenerator import pandas as pd import numpy as np # 1. Instantiate the stream generator generator = AGRAWALGenerator() generator.prepare_for_use() # 2. Get data from the stream X, y = generator.next_sample() print(X.shape, y.shape) >>> (1, 9) (1,) X, y = generator.next_sample(1000) print(X.shape, y.shape) >>> (1000, 9) (1000,) # 3. Check if the stream has more data generator.has_more_samples() >>> True # 4. Restart the stream generator.restart() # 5. Save data into a csv file [Optional] df = pd.DataFrame(np.hstack((X,np.array([y]).T)))