FIX: bug when visualizing byte nodes #352

BenjaminBossan · 2023-04-24T14:37:02Z

The visualize function failed when trying to visualize models that had byte attributes. This is because the node's children would contain raw bytes, which the function doesn't know how to visualize. Therefore, children of byte and byte array nodes are now skipped (in addition to the node types already being skipped).

To test this bug, I added a visualization test to the external packages like LightGBM, which make use of bytes. I'm not sure if some of the sklearn estimators could also be candidates, but it would surely be overkill to test visualizing all of them, whereas the overhead is not so big for the external packages.

The visualize function failed when trying to visualize models that had byte attributes. This is because the node's children would contain raw bytes, which the function doesn't know how to visualize. Therefore, children of byte and byte array nodes are now skipped (in addition to the node types already being skipped). To test this bug, I added a visualization test to the external packages like LightGBM, which make use of bytes. I'm not sure if some of the sklearn estimators could also be candidates, but it would surely be overkill to test visualizing all of them, whereas the overhead is not so big for the external packages.

Textualize/rich#330 (comment)

BenjaminBossan · 2023-04-24T15:07:55Z

@skops-dev/maintainers ready for review

I had to set the PYTHONIOENCODING env var for Windows tests to pass, no idea why it's necessary all of a sudden.

skops/io/_visualize.py

Truncate the number of bytes shown at 24.

Calling getvalue does not require a seek(0) afterwards.

BenjaminBossan · 2023-04-25T10:50:27Z

@adrinjalali I addressed your comment, please review again

adrinjalali · 2023-04-27T09:19:46Z

skops/io/tests/test_external.py

        assert_method_outputs_equal(estimator, loaded, X)

+        visualize(dumped, trusted=trusted)


these are going to print a bunch of stuff in the terminal when we run tests, shouldn't we be capturing the output? It also makes me realize, maybe we want to have an output arg for visualize?

these are going to print a bunch of stuff in the terminal when we run tests

pytest automatically captures stdout unless used with the -s option, no?

maybe we want to have an output arg for visualize?

What would that argument do? We have the sink argument, in case someone wants to push the output to a logger, for instance.

sink is not trivial to implement, it'd be nice if one could pass something like os.devnull and the output would be forwarded there.

I know by default pytest captures output, but even when we pass -s right now, there's almost no output, and this adds quite a bit there. If you think it's not worth fixing, I'm happy to merge as is.

sink is not trivial to implement, it'd be nice if one could pass something like os.devnull and the output would be forwarded there.

You should still be able to change sys.stdout, right?

I know by default pytest captures output, but even when we pass -s right now, there's almost no output, and this adds quite a bit there.

I see. For test_external.py, I made a change to not print anything.

CI is green, please review again

adrinjalali · 2023-04-27T11:37:25Z

skops/io/tests/test_external.py

+def _null(*args, **kwargs):
+    # used to prevent printing anything to stdout when calling visualize


passing this as sink is disabling a fair amount of code, not just printing. Are you sure you want this?

This is true. The relevant part of the code is executed, though, i.e. the tests would fail without the bugfix. If you want me to still change it, I can think of something (I guess auto fixture that overrides sys.stdout, not sure if it conflicts with pytest).

yes, but you're also specifically testing this bug anyway. So it'd be nice to have visualize to actually test the whole thing, in case later we add something which would break things.

Okay, makes sense, done.

More of the visualize stack is running this way, improving test coverage.

adrinjalali · 2023-04-27T12:53:20Z

skops/io/tests/test_external.py

@@ -136,6 +163,14 @@ class TestXGBoost:

    """

+    @pytest.fixture(autouse=True)


And TIL about the redirect_stdout context manager but it doesn't work here :D (I guess because of pytest)

BenjaminBossan added 2 commits April 24, 2023 16:32

Try fixing windows encoding errors

3371ed7

Textualize/rich#330 (comment)

adrinjalali reviewed Apr 24, 2023

View reviewed changes

skops/io/_visualize.py Show resolved Hide resolved

BenjaminBossan changed the title ~~Fix bug when visualizing byte nodes~~ FIX: bug when visualizing byte nodes Apr 25, 2023

BenjaminBossan added the bug Something isn't working label Apr 25, 2023

BenjaminBossan added 3 commits April 25, 2023 12:22

Show preview of bytes in visualize

ee49944

Truncate the number of bytes shown at 24.

Add entry to changes.rst

5ab7445

Simplify code

af522ef

Calling getvalue does not require a seek(0) afterwards.

adrinjalali reviewed Apr 27, 2023

View reviewed changes

Prevent printing to stdout for test_external.py

9307c77

adrinjalali reviewed Apr 27, 2023

View reviewed changes

Other method to suppress printing to stdout

5cb3038

More of the visualize stack is running this way, improving test coverage.

adrinjalali approved these changes Apr 27, 2023

View reviewed changes

adrinjalali merged commit dd0af39 into skops-dev:main Apr 27, 2023

BenjaminBossan deleted the bugfix-visualize-bytes-nodes branch April 27, 2023 12:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: bug when visualizing byte nodes #352

FIX: bug when visualizing byte nodes #352

BenjaminBossan commented Apr 24, 2023

BenjaminBossan commented Apr 24, 2023

BenjaminBossan commented Apr 25, 2023

adrinjalali Apr 27, 2023

BenjaminBossan Apr 27, 2023

adrinjalali Apr 27, 2023

BenjaminBossan Apr 27, 2023

BenjaminBossan Apr 27, 2023

adrinjalali Apr 27, 2023

BenjaminBossan Apr 27, 2023

adrinjalali Apr 27, 2023

BenjaminBossan Apr 27, 2023

adrinjalali Apr 27, 2023

BenjaminBossan Apr 27, 2023

		assert_method_outputs_equal(estimator, loaded, X)

		visualize(dumped, trusted=trusted)

		def _null(args, *kwargs):
		# used to prevent printing anything to stdout when calling visualize

		@@ -136,6 +163,14 @@ class TestXGBoost:

		"""

		@pytest.fixture(autouse=True)

FIX: bug when visualizing byte nodes #352

FIX: bug when visualizing byte nodes #352

Conversation

BenjaminBossan commented Apr 24, 2023

BenjaminBossan commented Apr 24, 2023

BenjaminBossan commented Apr 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment