skmultiflow.drift_detection.ddm

Classes

DDM([min_num_instances, warning_level, …])

DDM method for concept drift detection

class skmultiflow.drift_detection.ddm.DDM(min_num_instances=30, warning_level=2.0, out_control_level=3.0)[source][source]

DDM method for concept drift detection

Parameters
  • min_num_instances (int (default=30)) – The minimum required number of analyzed samples so change can be detected. This is used to avoid false detections during the early moments of the detector, when the weight of one sample is important.

  • warning_level (float (default=2.0)) – Warning Level

  • out_control_level (float (default=3.0)) – Out-control Level

Notes

DDM (Drift Detection Method) [1] is a concept change detection method based on the PAC learning model premise, that the learner’s error rate will decrease as the number of analysed samples increase, as long as the data distribution is stationary.

If the algorithm detects an increase in the error rate, that surpasses a calculated threshold, either change is detected or the algorithm will warn the user that change may occur in the near future, which is called the warning zone.

The detection threshold is calculated in function of two statistics, obtained when (pi + si) is minimum:

  • pmin: The minimum recorded error rate.

  • smin: The minimum recorded standard deviation.

At instant i, the detection algorithm uses:

  • pi: The error rate at instant i.

  • si: The standard deviation at instant i.

The conditions for entering the warning zone and detecting change are as follows:

  • if pi + si >= pmin + 2 * smin -> Warning zone

  • if pi + si >= pmin + 3 * smin -> Change detected

References

1

João Gama, Pedro Medas, Gladys Castillo, Pedro Pereira Rodrigues: Learning with Drift Detection. SBIA 2004: 286-295

Examples

>>> # Imports
>>> import numpy as np
>>> from skmultiflow.drift_detection import DDM
>>> ddm = DDM()
>>> # Simulating a data stream as a normal distribution of 1's and 0's
>>> data_stream = np.random.randint(2, size=2000)
>>> # Changing the data concept from index 999 to 1500, simulating an
>>> # increase in error rate
>>> for i in range(999, 1500):
...     data_stream[i] = 0
>>> # Adding stream elements to DDM and verifying if drift occurred
>>> for i in range(2000):
...     ddm.add_element(data_stream[i])
...     if ddm.detected_warning_zone():
...         print('Warning zone has been detected in data: ' + str(data_stream[i]) + ' - of index: ' + str(i))
...     if ddm.detected_change():
...         print('Change has been detected in data: ' + str(data_stream[i]) + ' - of index: ' + str(i))
add_element(prediction)[source][source]

Add a new element to the statistics

Parameters

prediction (int (either 0 or 1)) – This parameter indicates whether the last sample analyzed was correctly classified or not. 1 indicates an error (miss-classification).

Notes

After calling this method, to verify if change was detected or if the learner is in the warning zone, one should call the super method detected_change, which returns True if concept drift was detected and False otherwise.

detected_change()[source]

This function returns whether concept drift was detected or not.

Returns

Whether concept drift was detected or not.

Return type

bool

detected_warning_zone()[source]

If the change detector supports the warning zone, this function will return whether it’s inside the warning zone or not.

Returns

Whether the change detector is in the warning zone or not.

Return type

bool

get_class_type()[source]

The class type is a string that identifies the type of object generated by that module.

Returns

Return type

The class type

get_info()[source][source]

Collect information about the concept drift detector.

Returns

Configuration for the concept drift detector.

Return type

string

get_length_estimation()[source]

Returns the length estimation.

Returns

The length estimation

Return type

int

reset()[source][source]

Resets the change detector parameters.