Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ml_job metricset to Elasticsearch module #7196

Merged
merged 5 commits into from
Jul 11, 2018
Merged

Conversation

ruflin
Copy link
Contributor

@ruflin ruflin commented May 29, 2018

This adds a basic ml_job metricset to the Elasticsearch module.

The testing was enhanced to enabled to also test non x-pack Basic features by enabling the trial license as part of the test run.

@ruflin ruflin added in progress Pull request is currently in progress. module Metricbeat Metricbeat labels May 29, 2018
jobPath = "/_xpack/ml/anomaly_detectors/_all/_stats"
)

type MetricSet struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type MetricSet should have comment or be unexported

@@ -0,0 +1,46 @@
package ml_job

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use an underscore in package name

@@ -0,0 +1,50 @@
package ml_job

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use an underscore in package name

@ruflin
Copy link
Contributor Author

ruflin commented May 31, 2018

This PR depends on the release of ES 6.3.

@ruflin ruflin force-pushed the ml-stats branch 2 times, most recently from cb784cc to b3bb590 Compare June 26, 2018 12:15
@ruflin ruflin added review and removed in progress Pull request is currently in progress. labels Jul 5, 2018
"time_field":"timestamp",
"time_format": "epoch_ms"
}
}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation? And/or move this to a file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use the file that I already created for the system tests ...

@@ -48,7 +52,14 @@ var metricSets = []string{
func TestFetch(t *testing.T) {
compose.EnsureUp(t, "elasticsearch")

err := createIndex(getEnvHost() + ":" + getEnvPort())
host := getEnvHost() + ":" + getEnvPort()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

net.JoinHostPort()?

r.Error(err)
return []error{err}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return error also if jobsData.Jobs is nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say in case it's empty no error should be returned.

try:
es.transport.perform_request('POST', "/_xpack/license/start_trial?acknowledge=true")
except:
print "Trial already enabled"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could print the error instead, in case it is caused by other reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if response["count"] > 0:
return

file = os.path.join(self.beat_path, "module", "elasticsearch", "ml_job", "_meta", "test", "test_job.json")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use this file also for the integration test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"total_partition_field_count": 0,
"bucket_allocation_failures_count": 0,
"memory_status": "ok",
"log_time": 1525847933184
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to collect more metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This PR mainly creates the foundation.


// Not master, no event sent
if !isMaster {
logp.Debug("elasticsearch", "Trying to fetch index summary stats from a none master node.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "none master" should be "non-master"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, "index summary stats" should probably be "machine learning job stats".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both fixed.


content, err := m.HTTP.FetchContent()
if err != nil {
r.Error(err)
Copy link
Contributor

@ycombinator ycombinator Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When X-Pack isn't installed in ES, this if condition becomes true because a 400 error code is returned by the call to the /_xpack/ml/anomaly_detectors/_all/_stats API. This then results in error documents being indexed in the metricbeat index that look like this:

{
  "@timestamp":"2018-07-05T16:35:40.258Z",
  "metricset":{
    "rtt":6264,
    "namespace":"elasticsearch.ml.job",
    "name":"ml_job",
    "module":"elasticsearch",
    "host":"localhost:9200"
  },
  "error":{
    "message":"HTTP error 400 in ml_job: 400 Bad Request"
  },
  "beat":{
    "version":"7.0.0-alpha1",
    "name":"Shaunaks-MBP-2",
    "hostname":"Shaunaks-MBP-2"
  },
  "host":{
    "name":"Shaunaks-MBP-2"
  }
}

Just want to confirm that this is the behavior we want? Another option could be to log an error (but not index anything) if this API returns 400?

[EDIT] Obviously, whatever approach we take here should be consistent with all X-Pack APIs/metricsets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could we somehow make use of the xpack config setting here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone enables the ml_job metricset and X-Pack is not enabled I think an error document should be returned. What we probably should improve is the error message above and have something inside that it is not enabled.

I think our xpack config is not related to this.

"state": c.Str("state"),
"data_counts": c.Dict("data_counts", s.Schema{
"processed_record_count": c.Int("processed_record_count"),
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth including invalid_date_count as well. From https://www.elastic.co/guide/en/elasticsearch/reference/6.2/ml-jobstats.html#ml-datacounts:

invalid_date_count
(long) The number of records with either a missing date field or a date that could not be parsed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added. In general I think we will have to add more fields but I also would like to involve the ML team here in a follow up. I don't think we should send all the stats but only the ones that are most relevant which is a tricky question to answer.

@@ -0,0 +1,3 @@
=== elasticsearch ml MetricSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say ml_job instead of ml?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

}

// createIndex creates and elasticsearch index in case it does not exit yet
func enableLicense(host string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Name this enableTrialLicense just to be more specific about what it actually does?

Jobs []map[string]interface{} `json:"jobs"`
}

func eventsMapping(r mb.ReporterV2, content []byte) []error {
Copy link
Contributor

@ycombinator ycombinator Jul 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the return value is being ignored anyway so my following comment my be moot:

Looking at other metricsets in the elasticsearch module, it looks like their eventsMapping functions return error instead of []error. Further, to build this error return object they use var errs multierror.Errors to collect multiple errors and then return errs.Err() at the end of the function. For consistency maybe the eventsMapping function here should do the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, need to change it. This changed recently with the schema change from Jaime. Will adapt it accordingly.

ruflin added 4 commits July 10, 2018 10:10
This adds a basic ml_job metricset to the Elasticsearch module.

The testing was enhanced to enabled to also test non x-pack Basic features by enabling the trial license as part of the test run.
@@ -191,7 +180,7 @@ func createMLJob(host string) error {
return nil
}

req, err := http.NewRequest("PUT", "http://"+host+jobURL, strings.NewReader(mlJob))
req, err := http.NewRequest("PUT", "http://"+host+jobURL, bytes.NewReader(mlJob))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of question for my learning, not a review suggestion: why is bytes.NewReader more appropriate here than strings.NewReader?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference is that one takes a []byte array and the other on a string as param. ReadFile above returns a []byte array and previously it was a string (mlJob var).

Copy link
Contributor

@ycombinator ycombinator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

(I asked a question but it's incidental to the review).

@jsoriano jsoriano merged commit 73751e0 into elastic:master Jul 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants