-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: bug when visualizing byte nodes #352
FIX: bug when visualizing byte nodes #352
Conversation
The visualize function failed when trying to visualize models that had byte attributes. This is because the node's children would contain raw bytes, which the function doesn't know how to visualize. Therefore, children of byte and byte array nodes are now skipped (in addition to the node types already being skipped). To test this bug, I added a visualization test to the external packages like LightGBM, which make use of bytes. I'm not sure if some of the sklearn estimators could also be candidates, but it would surely be overkill to test visualizing all of them, whereas the overhead is not so big for the external packages.
@skops-dev/maintainers ready for review I had to set the |
Truncate the number of bytes shown at 24.
Calling getvalue does not require a seek(0) afterwards.
@adrinjalali I addressed your comment, please review again |
assert_method_outputs_equal(estimator, loaded, X) | ||
|
||
visualize(dumped, trusted=trusted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are going to print a bunch of stuff in the terminal when we run tests, shouldn't we be capturing the output? It also makes me realize, maybe we want to have an output arg for visualize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are going to print a bunch of stuff in the terminal when we run tests
pytest automatically captures stdout unless used with the -s
option, no?
maybe we want to have an output arg for
visualize
?
What would that argument do? We have the sink
argument, in case someone wants to push the output to a logger, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sink
is not trivial to implement, it'd be nice if one could pass something like os.devnull
and the output would be forwarded there.
I know by default pytest captures output, but even when we pass -s
right now, there's almost no output, and this adds quite a bit there. If you think it's not worth fixing, I'm happy to merge as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sink
is not trivial to implement, it'd be nice if one could pass something likeos.devnull
and the output would be forwarded there.
You should still be able to change sys.stdout
, right?
I know by default pytest captures output, but even when we pass -s right now, there's almost no output, and this adds quite a bit there.
I see. For test_external.py
, I made a change to not print anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is green, please review again
skops/io/tests/test_external.py
Outdated
def _null(*args, **kwargs): | ||
# used to prevent printing anything to stdout when calling visualize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
passing this as sink
is disabling a fair amount of code, not just printing. Are you sure you want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. The relevant part of the code is executed, though, i.e. the tests would fail without the bugfix. If you want me to still change it, I can think of something (I guess auto fixture that overrides sys.stdout
, not sure if it conflicts with pytest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but you're also specifically testing this bug anyway. So it'd be nice to have visualize
to actually test the whole thing, in case later we add something which would break things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, makes sense, done.
More of the visualize stack is running this way, improving test coverage.
@@ -136,6 +163,14 @@ class TestXGBoost: | |||
|
|||
""" | |||
|
|||
@pytest.fixture(autouse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And TIL about the redirect_stdout
context manager but it doesn't work here :D (I guess because of pytest)
The
visualize
function failed when trying to visualize models that had byte attributes. This is because the node's children would contain raw bytes, which the function doesn't know how to visualize. Therefore, children of byte and byte array nodes are now skipped (in addition to the node types already being skipped).To test this bug, I added a visualization test to the external packages like LightGBM, which make use of bytes. I'm not sure if some of the sklearn estimators could also be candidates, but it would surely be overkill to test visualizing all of them, whereas the overhead is not so big for the external packages.