Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added documentation for assertion on non-empty nulls [skip ci] #8271

Closed
wants to merge 2 commits into from

Conversation

razajafri
Copy link
Collaborator

This PR attempts at updating documentation to add recently added assertion on non-empty.

fixes #8238

Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@razajafri razajafri requested a review from revans2 May 10, 2023 17:56
Signed-off-by: Raza Jafri <rjafri@nvidia.com>
@@ -15,6 +15,14 @@ problem. Please also look at the current list of
[bugs](https://github.com/NVIDIA/spark-rapids/issues?q=is%3Aopen+is%3Aissue+label%3Abug) which are
typically incompatibilities that we have not yet addressed.

## Non-empty nulls

The SQL plugin doesn't support nested types to have non-empty nulls and if data being processed
Copy link
Collaborator

@abellina abellina May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure that this belongs to compatibility.md

The reason is that this is internal and only for the tests (actually this text doesn't mention testing.. I would make sure we qualify that this is only for tests).

Would CONTRIBUTING.md be a better fit for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't internal. Anyone passing data through our plugin that contains non-empty nulls will see this AssertionError unless they turn off assertions using the above mentioned Java command line parameter

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can the user control this???? Assertions are also off by default in java. You have to enable them with -ea or something similar. This is not a user facing problem.

If they pass in arrow data and we get an error because of this, then we have a BUG in our arrow translation. If they use the java APIs directly to write UDFs, then I can see it, but it should be a part of the docs for the CUDF java APIs. Not here, and especially not as the first thing in the compatibility docs.

Copy link
Collaborator Author

@razajafri razajafri May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are asking two questions here.

  1. Is adding this documentation needed at all?

I think it does until we have fixed the mergeAndSetValidity method and everywhere else where we can be introducing non-empty nulls. I wasn't sure if my fix for mergeAndSetValidity will make it in before 23.06 is released and until it does I imagine a user who has enabled assertions and consuming arrow data will see this.

  1. Where does this belong?

I wasn't sure where to put it, I think it belongs in compatibility because it has something to do with consuming external data which can be considered valid by some 3rd party tools like Arrow. We can put it wherever we have a consensus.

Another question that arises is if the user doesn't turn on asserts how does the user know that this could result in bad results if they consume from e.g. Arrow?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertions are off by default. The only time that a user would see this is if they are running unit tests against our code or enable assertions themselves for some reason and happen to run into one of these problems.

I think this needs to be in an FAQ and not compatibility. This is not a difference between Apache Spark with and without our plugin. Turning on our plugin will still produce the same results as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be in an FAQ

I would argue this doesn't meet the "frequently" part of FAQ. Agree with Bobby that the plugin will do the same thing as Apache Spark here.

What, specifically, are we worried about here that users really need to know about? What is the use case where users would encounter it? If there are no use cases, then I don't see why we need to put this in user-facing documentation.

@@ -15,6 +15,14 @@ problem. Please also look at the current list of
[bugs](https://github.com/NVIDIA/spark-rapids/issues?q=is%3Aopen+is%3Aissue+label%3Abug) which are
typically incompatibilities that we have not yet addressed.

## Non-empty nulls

The SQL plugin doesn't support nested types to have non-empty nulls and if the data being processed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"non-empty nulls" sounds cryptic. Can you rephrase or illustrate with an example?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree. I can't think of another word, let me explain with an example

@sameerz sameerz added the documentation Improvements or additions to documentation label May 11, 2023
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not something that user needs to worry about or we have done our job incorrectly (aka we have a bug). The user has no control over non-empty nulls. So telling them about this is confusing. The assertions should also be off by default in production and only enabled during our tests. This documentation should be part of developer docs. Not here.

@razajafri
Copy link
Collaborator Author

This doc change is not needed as assertions are disabled by default

@razajafri razajafri closed this May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DOC] Update documentation about assertion for non-empty nulls
6 participants