Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML-Dataframe] Add Data Frame client to the Java HLRC #39921

Merged
merged 5 commits into from
Mar 14, 2019

Conversation

davidkyle
Copy link
Member

Adds DataFrameClient to the Java HLRC and implements PUT and DELETE data frame transform.

The documentation needs fleshing out with descriptions of the data frame config objects and examples.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

[id="{upid}-{api}"]
=== Put Data Frame Transform API

The Put Data Frame Transform API is used to create a new {dataframe-job}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how {dataframe-job} is defined in the docs. How do these types of macros work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the link this resolves to a "data frame analytics job", which would point to the wrong docs. I think we need new macros: {dataframe-transform} or {dataframe-transform-job} - whatever we choose should be consistent everywhere. Because this is called "Put Data Frame Transform API" it would make sense to use{dataframe-transform}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'data frame analytics job' is a mouthful I raised elastic/docs#700

Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments and open questions.


import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;

public class DataFrameIT extends ESRestHighLevelClientTestCase {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DataFrameTransformIT ?


ack = execute(new DeleteDataFrameTransformRequest(transform.getId()), client::deleteDataFrameTransform,
client::deleteDataFrameTransformAsync);
assertTrue(ack.isAcknowledged());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Would be good to test that e.g. another delete throws an error


import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder;

public class DataFrameDocumentationIT extends ESRestHighLevelClientTestCase {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"DataFrameTransformDocumentation" ?

[id="{upid}-{api}"]
=== Put Data Frame Transform API

The Put Data Frame Transform API is used to create a new {dataframe-job}.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the link this resolves to a "data frame analytics job", which would point to the wrong docs. I think we need new macros: {dataframe-transform} or {dataframe-transform-job} - whatever we choose should be consistent everywhere. Because this is called "Put Data Frame Transform API" it would make sense to use{dataframe-transform}

<2> The source index or index pattern
<3> The destination index
<4> Optionally a QueryConfig
<5> The PivotConfig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this somewhat future proof, e.g.

The configuration object of the function, in this version we only support the pivot function.

(please help me regarding the wording)

I suggest to call the inner "thing" the "function" of the transform. We briefly discussed this once, it somehow fits in my opinion but I am open for other suggestions.

[id="{upid}-{api}-query-config"]
==== QueryConfig

The query with which to select data from the source index.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just "source"? As "source" is an expression that can resolve to more than 1 index, I would try to avoid the term "index" where it's possible.


==== PivotConfig

Defines the pivot transform `group by` fields and the aggregation to reduce the data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All together is called a transform. I used to call pivot the function of the transform, therefore I suggest: "Defines the pivot function ..."


* Terms
* Histogram
* Date Historgram

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo Historgram -> Histogram


===== GroupConfig
The grouping terms. Defines the group by and destination fields
which are produced by the grouping transform. There are 3 types of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grouping transform, you mean pivot transform or as I suggested above pivot function

===== AggregationConfig

Defines the aggregations for the group fields.
The aggregation must be one of `avg`, `min`, `max` or `sum`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also support cardinality and value_of - I wonder however how we approach documenting this. The above would mean, we have to change this place for every new aggregation we add which seems easily to forget.

Would it be better to e.g. have a separate page "Supported Aggregations for DataFrame Transforms" which we link to in this place?

@lcawl any idea/best practice?

@davidkyle
Copy link
Member Author

I changed the docs to use 'data frame transform' and addressed the other comments

@davidkyle davidkyle changed the title [ML-Dataframe] Add Data Fame client to the Java HLRC [ML-Dataframe] Add Data Frame client to the Java HLRC Mar 12, 2019
Copy link

@hendrikmuhs hendrikmuhs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle davidkyle merged commit 2cb8b65 into elastic:master Mar 14, 2019
@davidkyle davidkyle deleted the df-hlrc-client branch March 14, 2019 10:06
davidkyle added a commit to davidkyle/elasticsearch that referenced this pull request Mar 14, 2019
Adds DataFrameClient to the Java HLRC and implements PUT and 
DELETE data frame transform.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants