I am unable to insert null values of varchar datatype into redshift. #27

aiisyourfuture · 2021-06-23T15:46:37Z

Driver version

Redshift version

Client Operating System

JAVA/JVM version

Table schema

Problem description

Expected behaviour:
empid(int) sal(double) empname(varchar)
+++++++ ++++++++ +++++++++++++
10 100.00 'hello' -- able to insert the records.

           11.11               null                                 -- unable to insert this record and throwing ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables

Error message/stack trace:
redshift_jdbc_connection_1.log
redshift_jdbc_connection_2.log
redshift_jdbc.log

java.sql.BatchUpdateException: Batch entry 0 INSERT INTO tcms_fpl.Test2 ("emp","sal","dep") VALUES ('sa',NULL,NULL) was aborted: ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables. Call getNextException to see other errors in the batch.
at com.amazon.redshift.jdbc.BatchResultHandler.handleCompletion(BatchResultHandler.java:195)
at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:609)
at com.amazon.redshift.jdbc.RedshiftStatementImpl.internalExecuteBatch(RedshiftStatementImpl.java:978)
at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeBatch(RedshiftStatementImpl.java:1006)
at com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeBatch(RedshiftPreparedStatement.java:1723)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:692)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: com.amazon.redshift.util.RedshiftException: ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.
at com.amazon.redshift.core.v3.QueryExecutorIm
4. Any other details that can be helpful:

JDBC trace logs

Reproduction code

Mysql_df1=spark.read.format('jdbc').option("url", "jdbc:mysql://hostname/dbname").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select * from schema.Test) as hello").option("user", "uuuu").option("password", "ppppp").load()

write into redshift.
+++++++++

Mysql_df1.write.format("jdbc").option("url", "jdbc:redshift://:5439/dbname?rewriteBatchedStatements = true;LogLevel=6;LogPath=/temp/log").option("dbtable", "tcms_fpl.Test2").option("user", "uuuu").option("password", "ppppp").option("batchsize",10).option("isolationLevel","NONE").mode("overwrite").save()

The text was updated successfully, but these errors were encountered:

iggarish · 2021-06-23T22:09:47Z

Thanks for reporting issue. We will look into it and get back to you.

iggarish · 2021-06-23T22:20:04Z

Could you please check your code at this line:
Jun 23 14:08:59.550 FUNCTION [82 Executor task launch worker for task 0.0 in stage 1.0 (TID 1)] com.amazon.redshift.jdbc.RedshiftPreparedStatement.setNull: Enter (3,2005)

Basically application is trying to use setNull with Types.CLOB (i.e. 2005) for the third parameter. This type is not supported that's why you see the error.

aiisyourfuture · 2021-06-24T13:45:32Z

The spark is reading from mysql and writing into redshift thru JDBC driver with below lines of codes.Basically JDBC driver take caring of casting and other stuff and i didn't writing anything other than the below 4 lines of codes.

Look at write method, i am justing referring the driver and driver is taking care of it.

Reading from Mysql and create Dataframe
+++++++++++++++++++++++++++++++
Mysql_df1=spark.read.format('jdbc').option("url", "jdbc:mysql://hostname/dbname").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select * from schema.Test) as hello").option("user", "uuuu").option("password", "ppppp").load()

write into redshift.
+++++++++++++

Mysql_df1.write.format("jdbc").option("url", "jdbc:redshift://:5439/dbname?rewriteBatchedStatements = true;LogLevel=6;LogPath=/temp/log").option("dbtable", "tcms_fpl.Test2").option("user", "uuuu").option("password", "ppppp").option("batchsize",10).option("isolationLevel","NONE").mode("overwrite").save()

iggarish · 2021-06-24T15:12:16Z

That means spark is converting the code without database capability. MySQL must be supporting CLOB but not the Redshift.

aiisyourfuture · 2021-06-24T18:35:50Z

i don't know think so spark is coverting anything here ..it's keeping whatever datatypes it has .. here is the spark dataframe datatype after reading the data from mysql.

++++++++++++++++++

|-- sal: double (nullable = true)
|-- empID: string (nullable = true)
|-- me-name: string (nullable = true)

++++++++++++++++++++

if you look that the log , the JDBC driver is trying to parse the datatype before insert into redshift..

That trace which i captured about JDBC driver write logs before it insert into redshift.

JDBC driver preparing insert query to insert the data into redhisft.

Jun 23 15:12:15.076 DEBUG [82 Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] com.amazon.redshift.core.v3.QueryExecutorImpl.sendParse: FE=> Parse(stmt=S_1-2481566321930127,query="INSERT INTO tcms_fpl.Test2 ("emp","sal","dep") VALUES ($1,$2,$3)",oids={0,701,26})

com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: <=BE ParseComplete [S_1-2481566321930127]
Jun 23 15:12:15.175 DEBUG [82 Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: <=BE ParameterDescription

com.amazon.redshift.core.v3.QueryExecutorImpl.sendBind: FE=> Bind(stmt=S_1-2481566321930127,portal=C_2-2481567062770625,$1=<'sa'>,type=VARCHAR,$2=,type=FLOAT8,$3=,type=OID)

com.amazon.redshift.core.v3.QueryExecutorImpl.sendBind: FE=> Bind(stmt=S_1-2481566321930127,portal=C_3-2481567064637784,$1=<'sa'>,type=UNSPECIFIED,$2=,type=FLOAT8,$3=,type=OID)

com.amazon.redshift.core.v3.QueryExecutorImpl.receiveNoticeResponse: <=BE NoticeResponse(INFO: Function "text(oid)" not supported.

Thanks
Subbarao

iggarish · 2021-06-24T18:59:26Z

As per the JDBC log application is calling setNull using 2005 as datatype.
RedshiftPreparedStatement.setNull: Enter (3,2005)

So please check who is calling setNull with 2005 as datatype.

iggarish · 2021-06-24T19:02:45Z

This is from Types.class of JDK:
* The constant in the Java programming language, sometimes referred to
* as a type code, that identifies the generic SQL type
* CLOB.
* @SInCE 1.2
*/
public final static int CLOB = 2005;

aiisyourfuture · 2021-06-24T19:18:54Z

Ok thx for the update ..how do fix the problem ..appreciate it if you could guide or advise me how to take it forward.. As per my understanding, spark is using that jdbc driver to establish the connection and parse the data as redshift compatible but you’re saying different ..I am not sure how to move forward as spark internally generating the code .. Thanks Subbarao

…

On Jun 24, 2021, at 2:02 PM, ilesh garish ***@***.***> wrote: This is from Types.class of JDK: * The constant in the Java programming language, sometimes referred to * as a type code, that identifies the generic SQL type * CLOB. * @SInCE 1.2 */ public final static int CLOB = 2005; — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

iggarish · 2021-06-24T19:27:32Z

I can think of two ways to move forward:

Create an issue with Spark, if you can.
We can workaround only for setNull(2005), but I need to discuss internally and it may take time. But even after this workaround if application calls setClob() for parameter bindings it will fail. So ultimate goal should be application needs to understand types supported by datasource and do binding accordingly.

iggarish · 2021-06-25T01:23:18Z

See the similar issue with other connector with Spark:
exasol/spark-connector#46

Note: We started code changes for setNull(CLOB) treat as setNull(VARCHAR).

iggarish · 2021-06-29T18:33:20Z

Fix in 2.0.0.6. 2.0.0.6 is just released.

iggarish closed this as completed Jun 29, 2021

sync-by-unito bot mentioned this issue Jun 29, 2021

Bump redshift-jdbc42 from 2.0.0.5 to 2.0.0.6 liquibase/liquibase-redshift#60

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I am unable to insert null values of varchar datatype into redshift. #27

I am unable to insert null values of varchar datatype into redshift. #27

aiisyourfuture commented Jun 23, 2021

iggarish commented Jun 23, 2021

iggarish commented Jun 23, 2021

aiisyourfuture commented Jun 24, 2021

iggarish commented Jun 24, 2021

aiisyourfuture commented Jun 24, 2021

iggarish commented Jun 24, 2021

iggarish commented Jun 24, 2021

aiisyourfuture commented Jun 24, 2021 via email

iggarish commented Jun 24, 2021

iggarish commented Jun 25, 2021

iggarish commented Jun 29, 2021

I am unable to insert null values of varchar datatype into redshift. #27

I am unable to insert null values of varchar datatype into redshift. #27

Comments

aiisyourfuture commented Jun 23, 2021

Driver version

Redshift version

Client Operating System

JAVA/JVM version

Table schema

Problem description

JDBC trace logs

Reproduction code

iggarish commented Jun 23, 2021

iggarish commented Jun 23, 2021

aiisyourfuture commented Jun 24, 2021

iggarish commented Jun 24, 2021

aiisyourfuture commented Jun 24, 2021

iggarish commented Jun 24, 2021

iggarish commented Jun 24, 2021

aiisyourfuture commented Jun 24, 2021 via email

iggarish commented Jun 24, 2021

iggarish commented Jun 25, 2021

iggarish commented Jun 29, 2021