-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #5] enable user provided schema when reading exasol table #19
[Issue #5] enable user provided schema when reading exasol table #19
Conversation
Good morning @3cham, Thanks for PR! Please keep up the good work! Overall, everything looks good. However, I have some suggestions. I am going to list them from easier (easy to explain :)) to harder points.
// This will use inferSchema if configSchema is None
override def schema: StructType = configSchema.fold(inferSchema){ userSchema =>
logger.info(s"Using provided schema $userSchema")
userSchema
} You can read about it here, http://blog.originate.com/blog/2014/06/15/idiomatic-scala-your-options-do-not-match/, https://kwangyulseo.com/2014/05/21/scala-option-fold-vs-option-mapgetorelse/
private def fromParameters(params: Map...): (String, ExasolManager) = {
val queryString = parameters.get("query") match ..
...
val manager = ExasolConnectionManager(config)
(queryString, manager)
} And both override def createRelation(...): BaseRelation = {
val (queryString, manager) = fromParameters(parameters)
new ExasolRelation(sqlContext, queryString, None, manager)
}
override def createRelation(
....,
schema: StructType
): BaseRelation = {
val (queryString, manager) = fromParameters(parameters)
new ExasolRelation(sqlContext, queryString, Option(schema), manager)
} Hope these are clear. Please let me know if you have questions! |
} | ||
} | ||
|
||
override def schema: StructType = querySchema | ||
|
||
override def buildScan(): RDD[Row] = | ||
buildScan(Array.empty, Array.empty) | ||
buildScan(configSchema.map(_.name).toArray, Array.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove using configSchema here and keep as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @morazow, this is where I spent a lot of time to think about. In the ExasolRDD implementation, we use Converter.resultSetToRows(resultSet, querySchema)
, in this case the jdbc connection still have to fetch all columns including the ones not provided in the configSchema. If the idea is user must provide the full schema that is returns from the queryString, then it is fine to let it like that. However, we could restrict jdbc to fetch only the required columns. Do you think it is a good idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @3cham,
Yes, we only fetch required columns for both cases when schema is provided and is inferred.
I comment below, for better explaining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I could not comment on un-changed lines :)
Below, on lines 51 / 60 you will see,
Types.selectColumns(requiredColumns, schema)
Here, schema will be either user provided configSchema
or inferred inferSchema
. Then if we have required columns we select only those from that schema
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, I oversaw this line. Thank you for pointing that out :)
class ExasolRelation( | ||
context: SQLContext, | ||
queryString: String, | ||
configSchema: StructType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configSchema with type Option[StructType]
|
||
override def shortName(): String = "exasol" | ||
|
||
override def createRelation( | ||
sqlContext: SQLContext, | ||
parameters: Map[String, String] | ||
): BaseRelation = | ||
createRelation(sqlContext, parameters, new StructType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call new private function to check that query exist and create manager variable. Then new relation with schema as None.
val (queryString, manager) = fromParameter(params)
new ExasolRelation(context, queryString, None, manager)
@@ -27,7 +39,6 @@ class DefaultSource extends RelationProvider with DataSourceRegister { | |||
val config = ExasolConfiguration(parameters) | |||
val manager = ExasolConnectionManager(config) | |||
|
|||
new ExasolRelation(sqlContext, queryString, manager) | |||
new ExasolRelation(sqlContext, queryString, schema, manager) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before but with schema provided,
val (queryString, manager) = fromParameter(params)
new ExasolRelation(context, queryString, Option(schema), manager)
And put all other into separate private function which returns queryString
and manager
.
@@ -1,20 +1,32 @@ | |||
package com.exasol.spark | |||
|
|||
import java.util.InvalidPropertiesFormatException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception is not used anywhere
1177d38
to
f555955
Compare
Hi @morazow, please check the following up commits. If you don't mind, allow me to change the author info in the last PR ;) |
Hey @3cham , Yes, sure. I was also confused not seeing you in contributors list. Just for your information, I am also going to add authors list from my fork, https://github.com/morazow/spark-exasol-connector/blob/task/authors/AUTHORS.md. Best |
@morazow: many thanks, I used my lap from work with a different email config. That's why I need to change the info from the last PR. |
[Issue #5] enable user provided schema when reading exasol table
No description provided.