Adds a ConfigurableDataSource data generator for AnomalyDetection #160
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Adds a configurable data source for anomaly detection demos and testing. Updated the anomaly detection tutorial to use it.
Motivation
The reproducibility system we're adding in 4.2 should be able to reproduce all the tutorial models, and unfortunately several of those models are based on static data generators which would require special casing in the reproducibility system. Additionally those data generators aren't flexible enough to be useful demos or sufficiently complex to stress more complicated algorithms (both in terms of statistical complexity and runtime speed). This PR is the first in a series which will add configurable data sources which generate data for anomaly detection, clustering, multi-class classification and multi-label classification. We'll update some test code and any tutorials which depend upon the static data generators to use the new configurable data generators. The old static generators won't go away, they still are useful for unit testing basic feature handling functionality where it's useful to have something extremely simple.
Regression already has two such configurable generators added in 4.1 as part of the test harness for the output scaling feature which will remain unchanged.