Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Cast.toString as "cast" instead of "ansi_cast" under ANSI mode #5047

Merged

Conversation

HaoYang670
Copy link
Collaborator

Signed-off-by: remzi 13716567376yh@gmail.com

close #4870

Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670
Copy link
Collaborator Author

build

@HaoYang670 HaoYang670 added the audit_3.3.0 Audit related tasks for 3.3.0 label Mar 25, 2022
@@ -1544,11 +1544,7 @@ case class GpuCast(
}
}

override def toString: String = if (ansiMode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this? I think this is something that needs to go into the shim layer since it varies by Spark version.

This is the Spark 3.1.x implementation:

  override def toString: String = {
    val ansi = if (ansiEnabled) "ansi_" else ""
    s"${ansi}cast($child as ${dataType.simpleString})"
  }

@sameerz sameerz added this to the Mar 21 - Apr 1 milestone Mar 26, 2022
Copy link
Contributor

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the name change only impacts Spark 3.3.0 and later so we need to introduce shim code to determine the name for this expression

@revans2
Copy link
Collaborator

revans2 commented Mar 28, 2022

It looks like the name change only impacts Spark 3.3.0 and later so we need to introduce shim code to determine the name for this expression

The change to make it ansi vs not in the text was introduced in 3.0.0 and reverted in 3.3.0. For me it is so minor of a change that I just don't think it is worth shimming it/testing it. But I could be convinced if someone is actually parsing the text returned by this.

@andygrove
Copy link
Contributor

I also doubt that users would be relying on the name of the expression. At some point, it would be nice if our tests compared the schema of results from GPU and CPU and not just the results themselves. That would automate some of our audit work. I'll remove the change request.

@revans2
Copy link
Collaborator

revans2 commented Mar 28, 2022

@andygrove

At some point, it would be nice if our tests compared the schema of results from GPU and CPU and not just the results themselves.

That is a great point and is something that should be super simpler to add into asserts.py because we are passing in the dataframe. For the most part there is a clear mapping between the SQL type and a corresponding python type that is returned, but in a few cases it is ambiguous and we really should fix those cases. If you want to file an issue for that please do it. If not I will. You cannot see it, but I am face palming after I looked and realized that I forgot to add that into the integration tests when I first wrote them.

@andygrove
Copy link
Contributor

Thanks. I filed #5072 for improving the tests

@andygrove andygrove merged commit 3d1084c into NVIDIA:branch-22.06 Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit_3.3.0 Audit related tasks for 3.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA][Audit] [SPARK-38251][SQL]- Change Cast.toString as "cast" instead of "ansi_cast" under ANSI mode
4 participants