Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds a ConfigurableDataSource data generator for Clustering #161

Merged
merged 3 commits into from
Aug 19, 2021

Conversation

Craigacp
Copy link
Member

Description

Adds a configurable data source for clustering demos and testing. Updated the clustering tutorial to use it.

The timing numbers are from my M1 Mac which is so fast that the multithreaded one takes roughly the same time as the single threaded one rather than being 3x faster (though it is running on 4x the data).

Motivation

See #160 for the motivation.

@Craigacp Craigacp added the Oracle employee This PR is from an Oracle employee label Aug 18, 2021
@Craigacp Craigacp changed the title Adds a ConfigurableDataSource generator for Clustering Adds a ConfigurableDataSource data generator for Clustering Aug 18, 2021
Copy link
Member

@jhalexand jhalexand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description for "mixingPMF" (and/or the name) could probably be a little clearer and updated both in the @Config variable's description and in the constructor.

It occurs to me that we should be checking that the variances are positive numbers both because variances are usually expressed as positive numbers and, of course, because we take the sqrt of them.

@jhalexand
Copy link
Member

It also occurs to me that the same issue, re: variances, exists in GaussianAnomolyDataSource and could be updated along with this PR.

@Craigacp
Copy link
Member Author

Sure, I'll fix those.

@Craigacp Craigacp force-pushed the cluster-data-source branch from f803a68 to 991061f Compare August 18, 2021 15:27
@Craigacp
Copy link
Member Author

I've updated the PR with your suggestions.

Copy link
Member

@jhalexand jhalexand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@jhalexand jhalexand merged commit 62e2860 into main Aug 19, 2021
@jhalexand jhalexand deleted the cluster-data-source branch August 19, 2021 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Oracle employee This PR is from an Oracle employee
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants