-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I am unable to insert null values of varchar datatype into redshift. #27
Comments
Thanks for reporting issue. We will look into it and get back to you. |
Could you please check your code at this line: Basically application is trying to use setNull with Types.CLOB (i.e. 2005) for the third parameter. This type is not supported that's why you see the error. |
The spark is reading from mysql and writing into redshift thru JDBC driver with below lines of codes.Basically JDBC driver take caring of casting and other stuff and i didn't writing anything other than the below 4 lines of codes. Look at write method, i am justing referring the driver and driver is taking care of it. Reading from Mysql and create Dataframe write into redshift. Mysql_df1.write.format("jdbc").option("url", "jdbc:redshift://:5439/dbname?rewriteBatchedStatements = true;LogLevel=6;LogPath=/temp/log").option("dbtable", "tcms_fpl.Test2").option("user", "uuuu").option("password", "ppppp").option("batchsize",10).option("isolationLevel","NONE").mode("overwrite").save() |
That means spark is converting the code without database capability. MySQL must be supporting CLOB but not the Redshift. |
i don't know think so spark is coverting anything here ..it's keeping whatever datatypes it has .. here is the spark dataframe datatype after reading the data from mysql. ++++++++++++++++++ |-- sal: double (nullable = true) ++++++++++++++++++++ if you look that the log , the JDBC driver is trying to parse the datatype before insert into redshift.. That trace which i captured about JDBC driver write logs before it insert into redshift. JDBC driver preparing insert query to insert the data into redhisft. Jun 23 15:12:15.076 DEBUG [82 Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] com.amazon.redshift.core.v3.QueryExecutorImpl.sendParse: FE=> Parse(stmt=S_1-2481566321930127,query="INSERT INTO tcms_fpl.Test2 ("emp","sal","dep") VALUES ($1,$2,$3)",oids={0,701,26}) com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: <=BE ParseComplete [S_1-2481566321930127] com.amazon.redshift.core.v3.QueryExecutorImpl.sendBind: FE=> Bind(stmt=S_1-2481566321930127,portal=C_2-2481567062770625,$1=<'sa'>,type=VARCHAR,$2=,type=FLOAT8,$3=,type=OID) com.amazon.redshift.core.v3.QueryExecutorImpl.sendBind: FE=> Bind(stmt=S_1-2481566321930127,portal=C_3-2481567064637784,$1=<'sa'>,type=UNSPECIFIED,$2=,type=FLOAT8,$3=,type=OID) com.amazon.redshift.core.v3.QueryExecutorImpl.receiveNoticeResponse: <=BE NoticeResponse(INFO: Function "text(oid)" not supported. Thanks |
As per the JDBC log application is calling setNull using 2005 as datatype. So please check who is calling setNull with 2005 as datatype. |
This is from Types.class of JDK: |
Ok thx for the update ..how do fix the problem ..appreciate it if you could guide or advise me how to take it forward..
As per my understanding, spark is using that jdbc driver to establish the connection and parse the data as redshift compatible but you’re saying different ..I am not sure how to move forward as spark internally generating the code ..
Thanks
Subbarao
… On Jun 24, 2021, at 2:02 PM, ilesh garish ***@***.***> wrote:
This is from Types.class of JDK:
* The constant in the Java programming language, sometimes referred to
* as a type code, that identifies the generic SQL type
* CLOB.
* @SInCE 1.2
*/
public final static int CLOB = 2005;
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
I can think of two ways to move forward:
|
See the similar issue with other connector with Spark: Note: We started code changes for setNull(CLOB) treat as setNull(VARCHAR). |
Fix in 2.0.0.6. 2.0.0.6 is just released. |
Driver version
Redshift version
Client Operating System
JAVA/JVM version
Table schema
Problem description
Expected behaviour:
empid(int) sal(double) empname(varchar)
+++++++ ++++++++ +++++++++++++
10 100.00 'hello' -- able to insert the records.
Error message/stack trace:
redshift_jdbc_connection_1.log
redshift_jdbc_connection_2.log
redshift_jdbc.log
java.sql.BatchUpdateException: Batch entry 0 INSERT INTO tcms_fpl.Test2 ("emp","sal","dep") VALUES ('sa',NULL,NULL) was aborted: ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables. Call getNextException to see other errors in the batch.
at com.amazon.redshift.jdbc.BatchResultHandler.handleCompletion(BatchResultHandler.java:195)
at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:609)
at com.amazon.redshift.jdbc.RedshiftStatementImpl.internalExecuteBatch(RedshiftStatementImpl.java:978)
at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeBatch(RedshiftStatementImpl.java:1006)
at com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeBatch(RedshiftPreparedStatement.java:1723)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:692)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: com.amazon.redshift.util.RedshiftException: ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.
at com.amazon.redshift.core.v3.QueryExecutorIm
4. Any other details that can be helpful:
JDBC trace logs
Reproduction code
Mysql_df1=spark.read.format('jdbc').option("url", "jdbc:mysql://hostname/dbname").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select * from schema.Test) as hello").option("user", "uuuu").option("password", "ppppp").load()
write into redshift.
+++++++++
Mysql_df1.write.format("jdbc").option("url", "jdbc:redshift://:5439/dbname?rewriteBatchedStatements = true;LogLevel=6;LogPath=/temp/log").option("dbtable", "tcms_fpl.Test2").option("user", "uuuu").option("password", "ppppp").option("batchsize",10).option("isolationLevel","NONE").mode("overwrite").save()
The text was updated successfully, but these errors were encountered: