[ML] Refactor DataStreamDiagnostics to use array #30129

dimitris-athanasiou · 2018-04-25T13:22:59Z

This commit refactors the DataStreamDiagnostics class
achieving the following advantages:

simpler code; by encapsulating the moving bucket histogram
into its own class
better performance; by using an array to store the buckets
instead of a map
explicit handling of gap buckets; in preparation of fixing [ML] bucket_count is inaccurate when there are gaps in the data #30080

hendrikmuhs · 2018-04-25T13:43:35Z

...gin/ml/src/main/java/org/elasticsearch/xpack/ml/job/process/diagnostics/BucketHistogram.java

+    public BucketHistogram(Job job, BucketFlushListener bucketFlushListener) {
+        bucketSpanMs = job.getAnalysisConfig().getBucketSpan().millis();
+        latencyMs = job.getAnalysisConfig().getLatency() == null ? 0 : job.getAnalysisConfig().getLatency().millis();
+        maxSize = Math.max((int) (latencyMs / bucketSpanMs), MIN_BUCKETS);


I suspect an off by 1:

latency = 11 bucketSpanMs=4

would (ignoring MIN_BUCKETS) result in mazSize = 2,

bucket[0] => (0, 4]
bucket[1] => (4, 8]

if you get a results for t = 9, there is no bucket[2] to place it, and because latency == 11 you can not finalize bucket[0] yet. So I think you must round up, not down.

Ah, yes. I forgot to test how it works with latency and there aren't any unit tests. I'll fix this and add some tests as well.

(latencyMs + bucketSpanMs -1) / bucketSpanMs will return the right value.

Pushed the fix.

davidkyle · 2018-04-25T16:15:38Z

...gin/ml/src/main/java/org/elasticsearch/xpack/ml/job/process/diagnostics/BucketHistogram.java

+    public BucketHistogram(Job job, BucketFlushListener bucketFlushListener) {
+        bucketSpanMs = job.getAnalysisConfig().getBucketSpan().millis();
+        latencyMs = job.getAnalysisConfig().getLatency() == null ? 0 : job.getAnalysisConfig().getLatency().millis();
+        maxSize = Math.max((int) (latencyMs / bucketSpanMs), MIN_BUCKETS);


(latencyMs + bucketSpanMs -1) / bucketSpanMs will return the right value.

davidkyle · 2018-04-25T16:17:36Z

...gin/ml/src/main/java/org/elasticsearch/xpack/ml/job/process/diagnostics/BucketHistogram.java

+    private long latestFlushedBucketStartMs = -1;
+    private final BucketFlushListener bucketFlushListener;
+
+    public BucketHistogram(Job job, BucketFlushListener bucketFlushListener) {


bucketSpanMs and latencyMs are needed the job isn't you can break that dependency. Also the name BucketHistogram is very generic perhaps BucketDiagnostics

I like the rename to BucketDiagnostics. Pushed a commit that fixes that.

As for removing the dependency to job, I can see the point but on the other hand it increases argument lists and it is a dependency that does not seem completely out of place. I could easily imagine more job parameters being used somehow in the diagnostics.

hendrikmuhs

LGTM

dimitris-athanasiou · 2018-04-30T10:17:15Z

retest this please

This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing elastic#30080

dimitris-athanasiou · 2018-05-01T08:09:59Z

run sample packaging tests

This commit refactors the DataStreamDiagnostics class achieving the following advantages: - simpler code; by encapsulating the moving bucket histogram into its own class - better performance; by using an array to store the buckets instead of a map - explicit handling of gap buckets; in preparation of fixing #30080

* 6.x: Stop forking javac (#30462) Fix tribe tests Docs: Use task_id in examples of tasks (#30436) Security: Rename IndexLifecycleManager to SecurityIndexManager (#30442) Packaging: Set elasticsearch user to have non-existent homedir (#29007) [Docs] Fix typo in cardinality-aggregation.asciidoc (#30434) Avoid NPE in `more_like_this` when field has zero tokens (#30365) Build: Switch to building javadoc with html5 (#30440) Add a quick tour of the project to CONTRIBUTING (#30187) Add stricter geohash parsing (#30376) Reindex: Use request flavored methods (#30317) Silence SplitIndexIT.testSplitIndexPrimaryTerm test failure. (#30432) Auto-expand replicas when adding or removing nodes (#30423) Silence IndexUpgradeIT test failures. (#30430) Fix line length violation in cache tests Add failing test for core cache deadlock [DOCS] convert forcemerge snippet Update forcemerge.asciidoc (#30113) Added zentity to the list of API extension plugins (#29143) Fix the search request default operation behavior doc (#29302) (#29405) Watcher: Mark watcher as started only after loading watches (#30403) Correct wording in log message (#30336) Do not fail snapshot when deleting a missing snapshotted file (#30332) AwaitsFix testCreateShrinkIndexToN DOCS: Correct mapping tags in put-template api DOCS: Fix broken link in the put index template api Add put index template api to high level rest client (#30400) [Docs] Add snippets for POS stop tags default value Remove entry inadvertently picked into changelog Move respect accept header on no handler to 6.3.1 Respect accept header on no handler (#30383) [Test] Add analysis-nori plugin to the vagrant tests [Rollup] Validate timezone in range queries (#30338) [Docs] Fix bad link [Docs] Fix end of section in the korean plugin docs add the Korean nori plugin to the change logs Expose the Lucene Korean analyzer module in a plugin (#30397) Docs: remove transport_client from CCS role example (#30263) Test: remove cluster permission from CCS user (#30262) Watcher: Remove unneeded index deletion in tests fix docs branch version fix lucene snapshot version Upgrade to 7.4.0-snapshot-1ed95c097b (#30357) [ML][TEST] Clean up jobs in ModelPlotIT Watcher: Ensure trigger service pauses execution (#30363) [DOCS] Fixes ordering of changelog sections [DOCS] Commented out empty sections in the changelog to fix the doc build. (#30372) Make RepositoriesMetaData contents unmodifiable (#30361) Change signature of Get Repositories Response (#30333) 6.x Backport: Terms query validate bug (#30319) InternalEngineTests.testConcurrentOutOfOrderDocsOnReplica should use two documents (#30121) Security: reduce garbage during index resolution (#30180) Test: use trial license in qa tests with security [ML] Add integration test for model plots (#30359) SQL: Fix bug caused by empty composites (#30343) [ML] Account for gaps in data counts after job is reopened (#30294) [ML] Refactor DataStreamDiagnostics to use array (#30129) Make licensing FIPS-140 compliant (#30251) Do not load global state when deleting a snapshot (#29278) Don't load global state when only restoring indices (#29239) Tests: Use different watch ids per test in smoke test (#30331) Watcher: Make start/stop cycle more predictable and synchronous (#30118) [Docs] Add term query with normalizer example Adds Eclipse config for xpack licence headers (#30299) Fix message content in users tool (#30293) [DOCS] Removed X-Pack breaking changes page [DOCS] Added security breaking change [DOCS] Fixes link to TLS LDAP info [DOCS] Merges X-Pack release notes into changelog (#30350) [DOCS] Fixes broken links to bootstrap user (#30349) [Docs] Remove errant changelog line Fix NPE when CumulativeSum agg encounters null/empty bucket (#29641) [DOCS] Reorganizes authentication details in Stack Overview (#30280) Tests: Simplify VersionUtils released version splitting (#30322) Fix merging logic of Suggester Options (#29514) ReplicationTracker.markAllocationIdAsInSync may hang if allocation is cancelled (#30316) [DOCS] Adds LDAP realm configuration details (#30214) [DOCS] Adds native realm configuration details (#30215) Disable SSL on testing old BWC nodes (#30337) [DOCS] Enables edit links for X-Pack pages Cancelling a peer recovery on the source can leak a primary permit (#30318) SQL: Reduce number of ranges generated for comparisons (#30267) [DOCS] Adds links to changelog sections Convert server javadoc to html5 (#30279) REST Client: Add Request object flavored methods (#29623) Create default ES_TMPDIR on Windows (#30325) [Docs] Clarify `fuzzy_like_this` redirect (#30183) Fix docs of the `_ignored` meta field. Add a new `_ignored` meta field. (#29658) Move repository-azure fixture test to QA project (#30253)

dimitris-athanasiou added review v7.0.0 >refactoring :ml Machine learning v6.4.0 labels Apr 25, 2018

dimitris-athanasiou requested a review from hendrikmuhs April 25, 2018 13:25

hendrikmuhs reviewed Apr 25, 2018

View reviewed changes

davidkyle reviewed Apr 25, 2018

View reviewed changes

hendrikmuhs approved these changes Apr 27, 2018

View reviewed changes

dimitris-athanasiou added 4 commits April 30, 2018 16:47

Fix latency

5ea9de4

Rename BucketHistogram to BucketDiagnostics

3b74f00

Remove public qualifiers from package private class

09186bc

dimitris-athanasiou force-pushed the refactor-ml-data-stream-diagnostics branch from fee5ceb to 09186bc Compare April 30, 2018 15:49

dimitris-athanasiou merged commit 057cdff into elastic:master May 1, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Refactor DataStreamDiagnostics to use array #30129

[ML] Refactor DataStreamDiagnostics to use array #30129

dimitris-athanasiou commented Apr 25, 2018 •

edited

Loading

hendrikmuhs Apr 25, 2018 •

edited

Loading

dimitris-athanasiou Apr 25, 2018

davidkyle Apr 25, 2018

dimitris-athanasiou Apr 25, 2018

davidkyle Apr 25, 2018

davidkyle Apr 25, 2018

dimitris-athanasiou Apr 25, 2018

hendrikmuhs left a comment

dimitris-athanasiou commented Apr 30, 2018

dimitris-athanasiou commented May 1, 2018

[ML] Refactor DataStreamDiagnostics to use array #30129

[ML] Refactor DataStreamDiagnostics to use array #30129

Conversation

dimitris-athanasiou commented Apr 25, 2018 • edited Loading

hendrikmuhs Apr 25, 2018 • edited Loading

Choose a reason for hiding this comment

dimitris-athanasiou Apr 25, 2018

Choose a reason for hiding this comment

davidkyle Apr 25, 2018

Choose a reason for hiding this comment

dimitris-athanasiou Apr 25, 2018

Choose a reason for hiding this comment

davidkyle Apr 25, 2018

Choose a reason for hiding this comment

davidkyle Apr 25, 2018

Choose a reason for hiding this comment

dimitris-athanasiou Apr 25, 2018

Choose a reason for hiding this comment

hendrikmuhs left a comment

Choose a reason for hiding this comment

dimitris-athanasiou commented Apr 30, 2018

dimitris-athanasiou commented May 1, 2018

dimitris-athanasiou commented Apr 25, 2018 •

edited

Loading

hendrikmuhs Apr 25, 2018 •

edited

Loading