Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark Dataframe count pushdown #29

Merged
merged 5 commits into from
Nov 16, 2018
Merged

Spark Dataframe count pushdown #29

merged 5 commits into from
Nov 16, 2018

Conversation

morazow
Copy link
Contributor

@morazow morazow commented Nov 14, 2018

Fix for #24

Here, I was not sure of `putIfAbsent` is call by name or not. That is, the create connection
should be lazy and should not be evaluated if key already exists.
If there is a `.count` action after loading the Exasol dataframe, then we should not shuffle all
those data through network just for counting afterwards.

We detect `df.count()` operation when the list of column names (`requiredColumns`) is empty. In
such cases, perform a single count star query on Exasol and create Spark RDD of empty rows as many
as resulted count.

Fixes #24
@morazow morazow changed the title Spark Datafram count pushdown Spark Dataframe count pushdown Nov 14, 2018
@morazow morazow merged commit d109416 into exasol:master Nov 16, 2018
jpizagno pushed a commit to jpizagno/spark-exasol-connector that referenced this pull request Dec 4, 2018
Spark Dataframe count action pushdown
@morazow morazow deleted the count-pushdown branch December 10, 2018 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant