Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sort-merge join skew optimization #125

Open
richox opened this issue May 19, 2022 · 0 comments
Open

Support sort-merge join skew optimization #125

richox opened this issue May 19, 2022 · 0 comments
Labels
enhancement New feature or request

Comments

@richox
Copy link
Collaborator

richox commented May 19, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

currently, blaze hangs on TPC-DS q40 case (1TB) because of a severely skewed join (1,463,474,467 join 5 records). spark supports an option skew=true in SMJ which blaze does not.
without this feature, we should not convert skewed SMJ to native. otherwise the joining is very slow or even hangs.

Describe the solution you'd like
understand how spark performs skew join optimization. implement the same logic in blaze.

Describe alternatives you've considered

Additional context

@richox richox added the enhancement New feature or request label May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant