[R] Fix parsing decision stump. #7689

trivialfis · 2022-02-21T23:52:32Z

Fix in dataframe construction.
Fix in plotting.

Still need tests for more sophisticated cases like mixed tree types.

Close #7669

trivialfis · 2022-03-14T11:06:57Z

@hetong007 Could you please take a look into this fix? I'm a little bit confused by the regex parsing.

hetong007 · 2022-03-15T02:10:18Z

Sure. What exactly is the confusing part?

trivialfis · 2022-03-15T05:35:16Z

@hetong007 For instance this comment in the code:

skip some indices with spurious capture groups from anynumber_regex

I'm not entirely sure what's it skipping. In general, I would have more confidence about the correctness of the code if we use the JSON dump along with jsonlite for parsing. But right now in the R API, the user can directly pass a text string input so I can't make the change otherwise it would be breaking.

trivialfis · 2022-03-15T05:36:43Z

Would be great if you can review the PR and see if the fix makes sense.

hetong007 · 2022-03-16T11:16:33Z

I looked into the code with an example.

The to-parse text is "f0<0.5] yes=1,no=2,missing=1,gain=0.0366666317,cover=2", with branch_rx="f(\\d+)<([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)\\] yes=(\\d+),no=(\\d+),missing=(\\d+),gain=([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?),cover=([-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?)".

After the regmatches call, we have the results as

[[1]]
 [1] "f0<0.5] yes=1,no=2,missing=1,gain=0.0366666317,cover=2"
 [2] "0"
 [3] "0.5"
 [4] ""
 [5] "1"
 [6] "2"
 [7] "1"
 [8] "0.0366666317"
 [9] ""
[10] "2"
[11] ""

So basically the skipped indices contains either the input or an empty string, thus the list c(2, 3, 5, 6, 7, 8, 10) just slices those informative ones out.

Overall I think your fix here makes sense, with one inline comment.

R-package/R/xgb.model.dt.tree.R

This reverts commit c41c451.

trivialfis · 2022-03-16T14:27:45Z

@hetong007 Thank you for digging into this! Your explanation of the code is helpful.

On the issue of using constant instead of string value NA, I reverted the change as by the end of the function, there are procedures to convert each column to the correct type using as.numeric and as.integer. We need to use string value NA first, as the parsed values are strings and need to be consistent with the columns initialized by strings NA.

hetong007 · 2022-03-16T15:02:50Z

@hetong007 Thank you for digging into this! Your explanation of the code is helpful.

On the issue of using constant instead of string value NA, I reverted the change as by the end of the function, there are procedures to convert each column to the correct type using as.numeric and as.integer. We need to use string value NA first, as the parsed values are strings and need to be consistent with the columns initialized by strings NA.

Cool! This makes sense.

trivialfis added 2 commits February 25, 2022 14:23

[R] Fix parsing decision stump.

5b3e003

Check explicitly

120dd68

trivialfis force-pushed the fix-R-parse-dump branch from 30e73a7 to 120dd68 Compare February 25, 2022 08:39

trivialfis changed the title ~~[WIP] [R] Fix parsing decision stump.~~ [R] Fix parsing decision stump. Feb 28, 2022

trivialfis marked this pull request as ready for review February 28, 2022 08:41

hetong007 reviewed Mar 16, 2022

View reviewed changes

R-package/R/xgb.model.dt.tree.R Show resolved Hide resolved

Use constant instead of string.

c41c451

hetong007 approved these changes Mar 16, 2022

View reviewed changes

Revert "Use constant instead of string."

99a670f

This reverts commit c41c451.

trivialfis merged commit da35162 into dmlc:master Mar 16, 2022

trivialfis deleted the fix-R-parse-dump branch March 16, 2022 17:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R] Fix parsing decision stump. #7689

[R] Fix parsing decision stump. #7689

trivialfis commented Feb 21, 2022

trivialfis commented Mar 14, 2022

hetong007 commented Mar 15, 2022

trivialfis commented Mar 15, 2022 •

edited

Loading

trivialfis commented Mar 15, 2022

hetong007 commented Mar 16, 2022

trivialfis commented Mar 16, 2022 •

edited

Loading

hetong007 commented Mar 16, 2022

[R] Fix parsing decision stump. #7689

[R] Fix parsing decision stump. #7689

Conversation

trivialfis commented Feb 21, 2022

trivialfis commented Mar 14, 2022

hetong007 commented Mar 15, 2022

trivialfis commented Mar 15, 2022 • edited Loading

trivialfis commented Mar 15, 2022

hetong007 commented Mar 16, 2022

trivialfis commented Mar 16, 2022 • edited Loading

hetong007 commented Mar 16, 2022

trivialfis commented Mar 15, 2022 •

edited

Loading

trivialfis commented Mar 16, 2022 •

edited

Loading