Spark Dataframe count pushdown #29

morazow · 2018-11-14T10:29:34Z

Fix for #24

Here, I was not sure of `putIfAbsent` is call by name or not. That is, the create connection should be lazy and should not be evaluated if key already exists.

If there is a `.count` action after loading the Exasol dataframe, then we should not shuffle all those data through network just for counting afterwards. We detect `df.count()` operation when the list of column names (`requiredColumns`) is empty. In such cases, perform a single count star query on Exasol and create Spark RDD of empty rows as many as resulted count. Fixes #24

[ci skip]

Spark Dataframe count action pushdown

morazow added 4 commits November 13, 2018 15:32

Update versions

5846e02

Close resultSet & statement after inferring schema

4f96e9c

Create connection only when the key (url) is not available

ae4ec6a

Here, I was not sure of `putIfAbsent` is call by name or not. That is, the create connection should be lazy and should not be evaluated if key already exists.

morazow changed the title ~~Spark Datafram count pushdown~~ Spark Dataframe count pushdown Nov 14, 2018

Minor update on enrichQuery function comment

de5d467

[ci skip]

morazow merged commit d109416 into exasol:master Nov 16, 2018

jpizagno pushed a commit to jpizagno/spark-exasol-connector that referenced this pull request Dec 4, 2018

Merge pull request exasol#29 from morazow/count-pushdown

c79a248

Spark Dataframe count action pushdown

morazow deleted the count-pushdown branch December 10, 2018 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark Dataframe count pushdown #29

Spark Dataframe count pushdown #29

morazow commented Nov 14, 2018

Spark Dataframe count pushdown #29

Spark Dataframe count pushdown #29

Conversation

morazow commented Nov 14, 2018