Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite nvl function #9413

Closed
wants to merge 1 commit into from
Closed

rewrite nvl function #9413

wants to merge 1 commit into from

Conversation

guojidan
Copy link
Contributor

@guojidan guojidan commented Mar 1, 2024

Which issue does this PR close?

Closes #9365 .

Rationale for this change

now, nvl2 function is supported, so we don't need keep nvl code, we can rewrite nvl to nvl2

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the sql SQL Planner label Mar 1, 2024
@jonahgao
Copy link
Member

jonahgao commented Mar 1, 2024

I think that this manner of rewriting might have several downsides that need to be considered.

  1. Because NVL no longer exists in the function list, some of its features will be missing. For example, the feature in issue Add ScalarUDFs in missing function hints / suggested errors #9392, and the hint after entering wrong parameters.

On this PR's branch:

DataFusion CLI v36.0.0
❯ select nvl(1);
Error during planning: Invalid function 'nvl'.
Did you mean 'nanvl'?

The hint on the main branch would be better.

❯ select nvl(1);
Error during planning: No function matches the given name and argument types 'nvl(Int64)'. You might need to add explicit type casts.
	Candidate functions:
	nvl(Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8, Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8)
  1. There may be incorrect result when NVL's parameters contain a volatile expression.
    For example, given an expression expr is nullif(random()>0.5, true). The result of select nvl(expr, false)should always be false.
    But if it is rewritten as nvl(expr, expr, false), it could potentially yield a NULL value, when the random result of the first expr is less than 0.5, and the random result of the second expr is greater than 0.5.

I have performed such a test on this branch.

DataFusion CLI v36.0.0
❯  select nvl(nullif(random()>0.5, true), false) is null from (unnest(range(1,100)));
+--------------------------------------------------------------------------------------------------------------------------+
| nvl2(nullif(random() > Float64(0.5),Boolean(true)),nullif(random() > Float64(0.5),Boolean(true)),Boolean(false)) IS NULL |
+--------------------------------------------------------------------------------------------------------------------------+
| true                                                                                                                     |
| false                                                                                                                    |
| false                                                                                                                    |
| false                                                                                                                    |
| false                                                                                                                    |

There is approximately a 25% chance of an incorrect result.

select count(*) from (
   select
      nvl(nullif(random()>0.5, true), false)
   from (unnest(range(1,10000)))
) as t(a)
where a is null;
+----------+
| COUNT(*) |
+----------+
| 2517     |
+----------+

@@ -311,4 +318,33 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
}
}
}

fn all_suger_function() -> &'static [&'static str] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a const

const NVL_ALIAS: &'static [&'static str] = ...

planner_context: &mut PlannerContext,
) -> Result<Option<Expr>> {
match (name, args) {
// rewirte nvl function to nvl2 function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// rewirte nvl function to nvl2 function
// rewrite nvl function to nvl2 function

@guojidan
Copy link
Contributor Author

guojidan commented Mar 6, 2024

I think that this manner of rewriting might have several downsides that need to be considered.

  1. Because NVL no longer exists in the function list, some of its features will be missing. For example, the feature in issue Add ScalarUDFs in missing function hints #9392, and the hint after entering wrong parameters.

On this PR's branch:

DataFusion CLI v36.0.0
❯ select nvl(1);
Error during planning: Invalid function 'nvl'.
Did you mean 'nanvl'?

The hint on the main branch would be better.

❯ select nvl(1);
Error during planning: No function matches the given name and argument types 'nvl(Int64)'. You might need to add explicit type casts.
	Candidate functions:
	nvl(Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8, Boolean/UInt8/UInt16/UInt32/UInt64/Int8/Int16/Int32/Int64/Float32/Float64/Utf8/LargeUtf8)
  1. There may be incorrect result when NVL's parameters contain a volatile expression.
    For example, given an expression expr is nullif(random()>0.5, true). The result of select nvl(expr, false)should always be false.
    But if it is rewritten as nvl(expr, expr, false), it could potentially yield a NULL value, when the random result of the first expr is less than 0.5, and the random result of the second expr is greater than 0.5.

I have performed such a test on this branch.

DataFusion CLI v36.0.0
❯  select nvl(nullif(random()>0.5, true), false) is null from (unnest(range(1,100)));
+--------------------------------------------------------------------------------------------------------------------------+
| nvl2(nullif(random() > Float64(0.5),Boolean(true)),nullif(random() > Float64(0.5),Boolean(true)),Boolean(false)) IS NULL |
+--------------------------------------------------------------------------------------------------------------------------+
| true                                                                                                                     |
| false                                                                                                                    |
| false                                                                                                                    |
| false                                                                                                                    |
| false                                                                                                                    |

There is approximately a 25% chance of an incorrect result.

select count(*) from (
   select
      nvl(nullif(random()>0.5, true), false)
   from (unnest(range(1,10000)))
) as t(a)
where a is null;
+----------+
| COUNT(*) |
+----------+
| 2517     |
+----------+

yes, need to redesign rewrite manner

@alamb
Copy link
Contributor

alamb commented Mar 10, 2024

Marking this PR as draft as I don't think it is waiting for more review (and I am trying to get through the PR review backlog).

Do we have any alternate approaches? Or is the plan to leave both nvl and nvl2? Or something else?

@alamb alamb marked this pull request as draft March 10, 2024 10:49
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label May 10, 2024
@github-actions github-actions bot closed this May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rewrite nvl/ifnull to nvl2
4 participants