-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML-Dataframe] Add Data Frame client to the Java HLRC #39921
Conversation
Pinging @elastic/ml-core |
Pinging @elastic/es-core-features |
[id="{upid}-{api}"] | ||
=== Put Data Frame Transform API | ||
|
||
The Put Data Frame Transform API is used to create a new {dataframe-job}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure how {dataframe-job}
is defined in the docs. How do these types of macros work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the link this resolves to a "data frame analytics job", which would point to the wrong docs. I think we need new macros: {dataframe-transform}
or {dataframe-transform-job}
- whatever we choose should be consistent everywhere. Because this is called "Put Data Frame Transform API" it would make sense to use{dataframe-transform}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'data frame analytics job' is a mouthful I raised elastic/docs#700
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments and open questions.
|
||
import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder; | ||
|
||
public class DataFrameIT extends ESRestHighLevelClientTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: DataFrameTransformIT ?
|
||
ack = execute(new DeleteDataFrameTransformRequest(transform.getId()), client::deleteDataFrameTransform, | ||
client::deleteDataFrameTransformAsync); | ||
assertTrue(ack.isAcknowledged()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would be good to test that e.g. another delete throws an error
|
||
import static org.elasticsearch.common.xcontent.XContentFactory.jsonBuilder; | ||
|
||
public class DataFrameDocumentationIT extends ESRestHighLevelClientTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"DataFrameTransformDocumentation" ?
[id="{upid}-{api}"] | ||
=== Put Data Frame Transform API | ||
|
||
The Put Data Frame Transform API is used to create a new {dataframe-job}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to the link this resolves to a "data frame analytics job", which would point to the wrong docs. I think we need new macros: {dataframe-transform}
or {dataframe-transform-job}
- whatever we choose should be consistent everywhere. Because this is called "Put Data Frame Transform API" it would make sense to use{dataframe-transform}
<2> The source index or index pattern | ||
<3> The destination index | ||
<4> Optionally a QueryConfig | ||
<5> The PivotConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this somewhat future proof, e.g.
The configuration object of the function, in this version we only support the pivot function.
(please help me regarding the wording)
I suggest to call the inner "thing" the "function" of the transform. We briefly discussed this once, it somehow fits in my opinion but I am open for other suggestions.
[id="{upid}-{api}-query-config"] | ||
==== QueryConfig | ||
|
||
The query with which to select data from the source index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just "source"? As "source" is an expression that can resolve to more than 1 index, I would try to avoid the term "index" where it's possible.
|
||
==== PivotConfig | ||
|
||
Defines the pivot transform `group by` fields and the aggregation to reduce the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All together is called a transform. I used to call pivot the function
of the transform, therefore I suggest: "Defines the pivot function ..."
|
||
* Terms | ||
* Histogram | ||
* Date Historgram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo Historgram
-> Histogram
|
||
===== GroupConfig | ||
The grouping terms. Defines the group by and destination fields | ||
which are produced by the grouping transform. There are 3 types of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grouping transform
, you mean pivot transform
or as I suggested above pivot function
===== AggregationConfig | ||
|
||
Defines the aggregations for the group fields. | ||
The aggregation must be one of `avg`, `min`, `max` or `sum`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also support cardinality
and value_of
- I wonder however how we approach documenting this. The above would mean, we have to change this place for every new aggregation we add which seems easily to forget.
Would it be better to e.g. have a separate page "Supported Aggregations for DataFrame Transforms" which we link to in this place?
@lcawl any idea/best practice?
I changed the docs to use 'data frame transform' and addressed the other comments |
2514ef4
to
684b320
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds DataFrameClient to the Java HLRC and implements PUT and DELETE data frame transform.
Adds
DataFrameClient
to the Java HLRC and implements PUT and DELETE data frame transform.The documentation needs fleshing out with descriptions of the data frame config objects and examples.