Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add random SQL function #303

Merged
merged 3 commits into from
May 17, 2021
Merged

add random SQL function #303

merged 3 commits into from
May 17, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented May 10, 2021

Which issue does this PR close?

add random SQL function. unlike the now() function, each row shall be individually generated.

Closes #304

based on #307

Rationale for this change

Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.

What changes are included in this PR?

There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please add the breaking change label.

@jimexist
Copy link
Member Author

jimexist commented May 10, 2021

singular case

> select random() r, random() r2 union all select random() r, random() r2;
+--------------------+--------------------+
| r                  | r2                 |
+--------------------+--------------------+
| 0.570782163304485  | 0.7122336302240444 |
| 0.4344519217750207 | 0.256188733362102  |
+--------------------+--------------------+
2 row in set. Query took 0 seconds.

tabular align case

> select c1, random() r1, random() r2 from test;
+----+-----------------------+-----------------------+
| c1 | r1                    | r2                    |
+----+-----------------------+-----------------------+
| c  | 0.8858248695161692    | 0.6143187156567651    |
| d  | 0.6852840904479431    | 0.2709251169769842    |
| b  | 0.630098981690546     | 0.6077598267065687    |
| a  | 0.16307864728987242   | 0.8227547597140914    |
| b  | 0.008702150304241929  | 0.38766493311897166   |
| b  | 0.9450226561353341    | 0.16156125141103117   |
| e  | 0.6926494011573385    | 0.653725968146263     |
| a  | 0.4099823846109887    | 0.07170807057149098   |
| d  | 0.15579483001397176   | 0.3098832344415323    |
| a  | 0.8059293634597469    | 0.7994994516372975    |
| d  | 0.6593555317339335    | 0.19312486980857102   |
| a  | 0.9086155399677391    | 0.7665270701938516    |
| e  | 0.4044800758190923    | 0.05197052666680535   |
| d  | 0.0025795018740371045 | 0.46326292155346094   |
| b  | 0.2587391359273177    | 0.9162104619942888    |
| c  | 0.13645516508526523   | 0.15752850692883258   |
| e  | 0.3877465007257497    | 0.5699901097930642    |
| d  | 0.47780316828309766   | 0.7885763689480612    |
| d  | 0.22794474273876175   | 0.6095793437463859    |
| e  | 0.5455296223081574    | 0.4289355617327004    |
| e  | 0.2133157137081647    | 0.4184730286437153    |
| d  | 0.8158975348529787    | 0.394255938540677     |
| a  | 0.656112907355159     | 0.4385685299440685    |
| e  | 0.7187571688285945    | 0.4932180835725457    |
| c  | 0.4460089214506695    | 0.3288857238339975    |
| a  | 0.7659867073691242    | 0.7338590718453064    |
| c  | 0.8312317998888263    | 0.713249790823179     |
| a  | 0.5034689184624486    | 0.3564531411596683    |
| a  | 0.49555953754158066   | 0.5288964005078611    |
| b  | 0.488574157540709     | 0.34840668143906095   |
| e  | 0.962322293398729     | 0.0017991571576252419 |
| c  | 0.405323557632822     | 0.13801508069882895   |
| e  | 0.44810564754006976   | 0.32609793459010716   |
| b  | 0.0481996264470097    | 0.5624886587309861    |
| a  | 0.6856284932837569    | 0.30189559597954085   |
| c  | 0.4392762780370185    | 0.22778780763979856   |
| d  | 0.01862005761513874   | 0.9484007330304798    |
| c  | 0.5493861573252898    | 0.6787226076159059    |
| c  | 0.6804506222983264    | 0.40012929715726475   |
| c  | 0.17031205277616834   | 0.28969160031465124   |
| b  | 0.5516607121077097    | 0.32765665335953154   |
| d  | 0.7902397504137304    | 0.8047820664058987    |
| d  | 0.6235135313619369    | 0.8611740902370557    |
| a  | 0.4089158775121935    | 0.7632865452823412    |
| e  | 0.6479818623367095    | 0.3303923993061251    |
| b  | 0.8405383629477621    | 0.5761157012684217    |
| b  | 0.16013306898301072   | 0.18377799688319274   |
| c  | 0.5237107246024528    | 0.18702870828721596   |
| a  | 0.4267698184654345    | 0.6080320114682305    |
| d  | 0.6752001786973243    | 0.18744579948119555   |
| b  | 0.06394198453121214   | 0.8697468928632974    |
| c  | 0.5533880608032804    | 0.410087636982861     |
| d  | 0.17195857936051007   | 0.9642347754732317    |
| d  | 0.24714036094951686   | 0.2087533372695889    |
| b  | 0.38223418402701226   | 0.44797491855182825   |
| d  | 0.354147713947109     | 0.1583774861902576    |
| e  | 0.8978738349376183    | 0.6870679270888751    |
| b  | 0.0962990269141899    | 0.9251103720761726    |
| a  | 0.08754479049780262   | 0.3061691178397379    |
| b  | 0.8347877947374489    | 0.10492831402932445   |
| c  | 0.6772625649184507    | 0.5267610906406157    |
| b  | 0.6956531376927251    | 0.9243742506850876    |
| c  | 0.6096066968750522    | 0.15500300961880753   |
| e  | 0.8991012695527614    | 0.014652679069998786  |
| e  | 0.4048500168573612    | 0.7288386405759564    |
| d  | 0.8738661862341139    | 0.5736561149057426    |
| e  | 0.79628104001088      | 0.10359057613551692   |
| c  | 0.6015511737143195    | 0.4246275983489023    |
| d  | 0.5976422825371586    | 0.7110161517521416    |
| e  | 0.13607511799429672   | 0.5692416938012763    |
| e  | 0.7050104248345099    | 0.48394357812924893   |
| a  | 0.3794121594380737    | 0.2762570292624906    |
| a  | 0.3756937524394046    | 0.8349592211879893    |
| e  | 0.6780514311786121    | 0.06202299343125328   |
| a  | 0.9916898758023276    | 0.25480434620940917   |
| b  | 0.11861101237985605   | 0.36793787570512615   |
| e  | 0.9779069828477198    | 0.706631326575605     |
| c  | 0.05169106965696968   | 0.757319676986232     |
| e  | 0.5008578861141886    | 0.5542545101873164    |
| c  | 0.3080413824916599    | 0.6461181444579944    |
| a  | 0.3499918991104509    | 0.3738674842979186    |
| c  | 0.49266139042465706   | 0.44786989508250286   |
| b  | 0.8294106157763355    | 0.8250357289976049    |
| a  | 0.8145941535102705    | 0.010227803378715983  |
| a  | 0.19432122069500224   | 0.500087727457039     |
| c  | 0.5228722683904334    | 0.19655375516418694   |
| a  | 0.7264844525200564    | 0.7118351314074298    |
| c  | 0.1526188214350377    | 0.2543946368251362    |
| c  | 0.33950854157702826   | 0.960977006313132     |
| c  | 0.6350990913317207    | 0.7731276647898677    |
| b  | 0.24010294541852772   | 0.8652139521697786    |
| a  | 0.0883054439477311    | 0.4627145656673548    |
| a  | 0.540637100589954     | 0.8545095562126641    |
| b  | 0.8328479394485582    | 0.5200373050323923    |
| d  | 0.527288611622466     | 0.8929364305158876    |
| e  | 0.20030569328547343   | 0.5677935767953404    |
| e  | 0.7779991255911018    | 0.9255994347346632    |
| d  | 0.8406514575091932    | 0.4562466008463426    |
| b  | 0.7545561897792099    | 0.47341049730312923   |
| e  | 0.2561046134849507    | 0.28863618306585237   |
+----+-----------------------+-----------------------+
100 row in set. Query took 0 seconds.

@jimexist jimexist marked this pull request as draft May 10, 2021 07:47
@jimexist jimexist changed the title add random SQL function WIP add random SQL function May 10, 2021
@jimexist jimexist force-pushed the add-random-function branch 2 times, most recently from 37c8fa0 to a9915e7 Compare May 10, 2021 10:12
@jimexist jimexist marked this pull request as ready for review May 10, 2021 10:12
@jimexist jimexist changed the title WIP add random SQL function add random SQL function May 10, 2021
@jimexist jimexist force-pushed the add-random-function branch 3 times, most recently from 30d6368 to 35d440c Compare May 11, 2021 00:41
// evaluate the arguments, if there are no arguments we'll instead pass in a null array of
// batch size (as a convention)
let inputs = match self.args.len() {
0 => vec![ColumnarValue::Array(Arc::new(NullArray::new(
Copy link
Contributor

@Dandandan Dandandan May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about passing the length as a ColumnarValue::Scalar value for now?
I am not sure if I'm totally happy with that either, but that wil avoid generating a temporary array only for accessing the length.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the alternative is to drastically change the signature of scalar functions. it confess compared with that this is cleaner (but a bit hacky).

will change to use a scalar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess then another option is to add a new type of node which models no arg functions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #307

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that NullArray is composed by zero buffers, zero childs, no validity and one datatype, so the cost to instantiate it is really small. The advantage over a ScalarValue is that the semantics of getting a length are preserved: use array.len() as any other array.

I am not married with any; was just trying to think about this from a documentations' perspective:

We support zero-argument UDFs. They MUST be declared as accepting zero arguments and the function signature MUST be a single argument. DataFusion will pass an Array to it, from which you can retrieve its length via Array::len(). The function MUST return an array whose number of rows equals the length of the array.

If we pass a scalar of any type, if the evaluation is distributed, I believe that we will have to serialize Scalar -> Array in Ballista.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb @Dandandan and @jorgecarleitao thanks for the discussion.

I agree that both ways are similar and equally “hacky” but for lack of a better solution they are okay. I'd slightly prefer null array because there's no ScalarValue::USize and having to convert from/to UInt32 is a bit cumbersome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgecarleitao / @Dandandan / @jimexist are we all cool with using a NullArray to pass "how many rows are passed to this function that has no input arguments"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's do that for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the plan should be merge #328 and then rebase this PR to pick up that change

@codecov-commenter
Copy link

codecov-commenter commented May 16, 2021

Codecov Report

Merging #303 (fb74a59) into master (ed92673) will increase coverage by 0.01%.
The diff coverage is 90.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #303      +/-   ##
==========================================
+ Coverage   75.71%   75.72%   +0.01%     
==========================================
  Files         143      143              
  Lines       23881    23910      +29     
==========================================
+ Hits        18081    18107      +26     
- Misses       5800     5803       +3     
Impacted Files Coverage Δ
datafusion/src/physical_plan/type_coercion.rs 99.32% <ø> (ø)
datafusion/src/physical_plan/functions.rs 89.58% <85.71%> (-0.03%) ⬇️
datafusion/src/physical_plan/math_expressions.rs 90.00% <86.66%> (-10.00%) ⬇️
datafusion/tests/sql.rs 99.88% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed92673...fb74a59. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking really nice now. Thank you @jimexist !

@alamb alamb added datafusion Changes in the datafusion crate enhancement New feature or request labels May 17, 2021
@alamb alamb merged commit 6c050b8 into apache:master May 17, 2021
@jimexist jimexist deleted the add-random-function branch May 25, 2021 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add random SQL function
6 participants