-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit predicted category using an appropriate JSON type. #877
Emit predicted category using an appropriate JSON type. #877
Conversation
0825066
to
1eb3b44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd rather use standard library functions to do the string to number conversions instead of my core::CStringUtils
suggestions then that's fine, but please make sure exceptions won't fail the entire analysis, all possible values a 64 bit signed integer can hold are covered on all platforms and we log when there are unexpected conversion errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (modulo adhering to naming conventions).
lib/api/unittest/CDataFrameTrainBoostedTreeClassifierRunnerTest.cc
Outdated
Show resolved
Hide resolved
… the field is really used in C++ code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd rather use standard library functions to do the string to number conversions instead of my
core::CStringUtils
suggestions then that's fine, but please make sure exceptions won't fail the entire analysis, all possible values a 64 bit signed integer can hold are covered on all platforms and we log when there are unexpected conversion errors.
You might think that problem 1 can be solved by using stol instead of stoi, but this is not the case, because on Windows a Java long is 64 bits but a C++ long is 32 bits. The C++ type that reliably corresponds to Java's long is int64_t.
Sounds like you've gone through this kind of issues before. Thanks for sharing. I'll gladly use the library method for doing conversions.
Additionally, I've renamed dependent_variable_type
to prediction_field_type
as that's how the field is really used in the code
lib/api/unittest/CDataFrameTrainBoostedTreeClassifierRunnerTest.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
run elasticsearch-ci |
@przemekwitek to get the tests to pass you need to run clang-format:
|
Thanks for the hint. I'm wondering if it was possible to make this message stand out more in the console output. I was looking for the failure reason yesterday before you hint but couldn't find it. |
It gets printed from here: ml-cpp/dev-tools/check-style.sh Lines 68 to 72 in 881e5d0
You are welcome to open a PR to change that so that it stands out more. If you want to use something like escape sequences to change colours you can use the PR build to check the escape sequences are interpreted correctly by Jenkins - make a PR that messes up the formatting in a source file and modifies |
Currently, classification analysis allows dependent variable of integer or boolean type but in the results field, the prediction field is always emitted as JSON string (so
true
becomes"true"
,1
becomes"1"
etc.).A solution to that problem is to pass desired
prediction_field_type
from Java to C++ and make C++ emitbool
,int
orstring
JSON field depending on theprediction_field_type
passed.This PR implements the C++ part of this solution.
Relates elastic/elasticsearch#49796