Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am unable to insert null values of varchar datatype into redshift. #27

Closed
aiisyourfuture opened this issue Jun 23, 2021 · 11 comments
Closed

Comments

@aiisyourfuture
Copy link

Driver version

Redshift version

Client Operating System

JAVA/JVM version

Table schema

Problem description

  1. Expected behaviour:
    empid(int) sal(double) empname(varchar)
    +++++++ ++++++++ +++++++++++++
    10 100.00 'hello' -- able to insert the records.

  2.            11.11               null                                 -- unable to insert this record and throwing ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables
    
  3. Error message/stack trace:
    redshift_jdbc_connection_1.log
    redshift_jdbc_connection_2.log
    redshift_jdbc.log

java.sql.BatchUpdateException: Batch entry 0 INSERT INTO tcms_fpl.Test2 ("emp","sal","dep") VALUES ('sa',NULL,NULL) was aborted: ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables. Call getNextException to see other errors in the batch.
at com.amazon.redshift.jdbc.BatchResultHandler.handleCompletion(BatchResultHandler.java:195)
at com.amazon.redshift.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:609)
at com.amazon.redshift.jdbc.RedshiftStatementImpl.internalExecuteBatch(RedshiftStatementImpl.java:978)
at com.amazon.redshift.jdbc.RedshiftStatementImpl.executeBatch(RedshiftStatementImpl.java:1006)
at com.amazon.redshift.jdbc.RedshiftPreparedStatement.executeBatch(RedshiftPreparedStatement.java:1723)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:692)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:856)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:854)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1020)
at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1020)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2242)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: com.amazon.redshift.util.RedshiftException: ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.
at com.amazon.redshift.core.v3.QueryExecutorIm
4. Any other details that can be helpful:

JDBC trace logs

Reproduction code

Mysql_df1=spark.read.format('jdbc').option("url", "jdbc:mysql://hostname/dbname").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select * from schema.Test) as hello").option("user", "uuuu").option("password", "ppppp").load()

write into redshift.
+++++++++

Mysql_df1.write.format("jdbc").option("url", "jdbc:redshift://:5439/dbname?rewriteBatchedStatements = true;LogLevel=6;LogPath=/temp/log").option("dbtable", "tcms_fpl.Test2").option("user", "uuuu").option("password", "ppppp").option("batchsize",10).option("isolationLevel","NONE").mode("overwrite").save()

@iggarish
Copy link
Contributor

Thanks for reporting issue. We will look into it and get back to you.

@iggarish
Copy link
Contributor

Could you please check your code at this line:
Jun 23 14:08:59.550 FUNCTION [82 Executor task launch worker for task 0.0 in stage 1.0 (TID 1)] com.amazon.redshift.jdbc.RedshiftPreparedStatement.setNull: Enter (3,2005)

Basically application is trying to use setNull with Types.CLOB (i.e. 2005) for the third parameter. This type is not supported that's why you see the error.

@aiisyourfuture
Copy link
Author

The spark is reading from mysql and writing into redshift thru JDBC driver with below lines of codes.Basically JDBC driver take caring of casting and other stuff and i didn't writing anything other than the below 4 lines of codes.

Look at write method, i am justing referring the driver and driver is taking care of it.

Reading from Mysql and create Dataframe
+++++++++++++++++++++++++++++++
Mysql_df1=spark.read.format('jdbc').option("url", "jdbc:mysql://hostname/dbname").option("driver", "com.mysql.jdbc.Driver").option("dbtable", "(select * from schema.Test) as hello").option("user", "uuuu").option("password", "ppppp").load()

write into redshift.
+++++++++++++

Mysql_df1.write.format("jdbc").option("url", "jdbc:redshift://:5439/dbname?rewriteBatchedStatements = true;LogLevel=6;LogPath=/temp/log").option("dbtable", "tcms_fpl.Test2").option("user", "uuuu").option("password", "ppppp").option("batchsize",10).option("isolationLevel","NONE").mode("overwrite").save()

@iggarish
Copy link
Contributor

That means spark is converting the code without database capability. MySQL must be supporting CLOB but not the Redshift.

@aiisyourfuture
Copy link
Author

i don't know think so spark is coverting anything here ..it's keeping whatever datatypes it has .. here is the spark dataframe datatype after reading the data from mysql.

++++++++++++++++++

|-- sal: double (nullable = true)
|-- empID: string (nullable = true)
|-- me-name: string (nullable = true)

++++++++++++++++++++

if you look that the log , the JDBC driver is trying to parse the datatype before insert into redshift..

That trace which i captured about JDBC driver write logs before it insert into redshift.

JDBC driver preparing insert query to insert the data into redhisft.

Jun 23 15:12:15.076 DEBUG [82 Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] com.amazon.redshift.core.v3.QueryExecutorImpl.sendParse: FE=> Parse(stmt=S_1-2481566321930127,query="INSERT INTO tcms_fpl.Test2 ("emp","sal","dep") VALUES ($1,$2,$3)",oids={0,701,26})

com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: <=BE ParseComplete [S_1-2481566321930127]
Jun 23 15:12:15.175 DEBUG [82 Executor task launch worker for task 0.0 in stage 0.0 (TID 0)] com.amazon.redshift.core.v3.QueryExecutorImpl.processResultsOnThread: <=BE ParameterDescription

com.amazon.redshift.core.v3.QueryExecutorImpl.sendBind: FE=> Bind(stmt=S_1-2481566321930127,portal=C_2-2481567062770625,$1=<'sa'>,type=VARCHAR,$2=,type=FLOAT8,$3=,type=OID)

com.amazon.redshift.core.v3.QueryExecutorImpl.sendBind: FE=> Bind(stmt=S_1-2481566321930127,portal=C_3-2481567064637784,$1=<'sa'>,type=UNSPECIFIED,$2=,type=FLOAT8,$3=,type=OID)

com.amazon.redshift.core.v3.QueryExecutorImpl.receiveNoticeResponse: <=BE NoticeResponse(INFO: Function "text(oid)" not supported.

Thanks
Subbarao

@iggarish
Copy link
Contributor

As per the JDBC log application is calling setNull using 2005 as datatype.
RedshiftPreparedStatement.setNull: Enter (3,2005)

So please check who is calling setNull with 2005 as datatype.

@iggarish
Copy link
Contributor

This is from Types.class of JDK:
* The constant in the Java programming language, sometimes referred to
* as a type code, that identifies the generic SQL type
* CLOB.
* @SInCE 1.2
*/
public final static int CLOB = 2005;

@aiisyourfuture
Copy link
Author

aiisyourfuture commented Jun 24, 2021 via email

@iggarish
Copy link
Contributor

I can think of two ways to move forward:

  1. Create an issue with Spark, if you can.
  2. We can workaround only for setNull(2005), but I need to discuss internally and it may take time. But even after this workaround if application calls setClob() for parameter bindings it will fail. So ultimate goal should be application needs to understand types supported by datasource and do binding accordingly.

@iggarish
Copy link
Contributor

See the similar issue with other connector with Spark:
exasol/spark-connector#46

Note: We started code changes for setNull(CLOB) treat as setNull(VARCHAR).

@iggarish
Copy link
Contributor

Fix in 2.0.0.6. 2.0.0.6 is just released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants