Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]Audit - [SPARK-34605][SQL] Support java.time.Duration as an external type of the day-time interval type #1948

Closed
tgravescs opened this issue Mar 16, 2021 · 6 comments
Assignees
Labels
audit_3.2.0 P2 Not required for release task Work required that improves the product but is not user facing

Comments

@tgravescs
Copy link
Collaborator

tgravescs commented Mar 16, 2021

Is your feature request related to a problem? Please describe.
spark made a change to support java.time.Duration for SPARK-27790 Support ANSI SQL INTERVAL types

apache/spark@17601e0

It looks like we copied the getConverterForType function in GpuRowToColumnarExec so we should look closer at pulling that in.

https://issues.apache.org/jira/browse/SPARK-34605

@tgravescs tgravescs added feature request New feature or request ? - Needs Triage Need team to review and classify audit_3.2.0 labels Mar 16, 2021
@sameerz
Copy link
Collaborator

sameerz commented Apr 6, 2021

@tgravescs will review with @revans2

@tgravescs
Copy link
Collaborator Author

tgravescs commented Apr 6, 2021

getConverterFortype is used in GpuRowToColumnarExec:
https://github.com/NVIDIA/spark-rapids/blob/branch-0.5/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuRowToColumnarExec.scala#L94

I don't know if there is already another lira to add support for DayTimeIntervalType. I assume we need to add it to our isSupportedType checks and so forth as well. Although we may just let it fall though, do we have a standard for this @revans2 ?

@sameerz sameerz added task Work required that improves the product but is not user facing P0 Must have for release and removed ? - Needs Triage Need team to review and classify feature request New feature or request labels Apr 13, 2021
@sameerz
Copy link
Collaborator

sameerz commented Apr 13, 2021

P1 for whether we support time.Duration for the cached batch serializer. P3 for whether we are going to support time.Duration.

@revans2
Copy link
Collaborator

revans2 commented Sep 3, 2021

There are a number of new types that have been added into spark and if we run into any of them we fall back to the CPU for it. Eventually we want to add in more types, but that should not block any of the existing code/queries from working.

@revans2 revans2 added P2 Not required for release and removed P0 Must have for release labels Sep 3, 2021
@res-life res-life self-assigned this Mar 8, 2022
@res-life
Copy link
Collaborator

Duration is an external type, Plugin do not need to handle it.
The Duration is used when creating a df, see below:

schema = StructType([
    StructField("c1", DayTimeIntervalType(
                DayTimeIntervalType.DAY, DayTimeIntervalType.DAY
            ), False),
])
data = [(timedelta( days=50, seconds=27), ),]
df = spark.createDataFrame(
        SparkContext.getOrCreate().parallelize(data, numSlices=2),
        schema)

day-time interval type is stored as int64 internally in Spark,
and Spark also stores it as int64 to parquet file along with an metada which indicating the mapping between day-time interval and int64

This issue supports ANSI intervals to/from Parquet:
#4145

Can we close it? @revans2

@revans2
Copy link
Collaborator

revans2 commented Mar 16, 2022

I think we can close this because we can do columnar to row and row to columnar on these types.

@revans2 revans2 closed this as completed Mar 16, 2022
@sameerz sameerz added wontfix This will not be worked on and removed wontfix This will not be worked on labels Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
audit_3.2.0 P2 Not required for release task Work required that improves the product but is not user facing
Projects
None yet
Development

No branches or pull requests

4 participants