Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [seatunnel-spark-new-connector-example] java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String #2067

Closed
2 of 3 tasks
2013650523 opened this issue Jun 27, 2022 · 4 comments
Labels

Comments

@2013650523
Copy link
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

can not run local spark new connector example;

SeaTunnel Version

api-draft

SeaTunnel Config

env {
  # You can set spark configuration here
  # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
  #job.mode = BATCH
  spark.app.name = "SeaTunnel"
  spark.executor.instances = 2
  spark.executor.cores = 1
  spark.executor.memory = "1g"
  spark.master = local
}

source {
  # This is a example input plugin **only for test and demonstrate the feature input plugin**
  FakeSource {
    result_table_name = "fake"
    field_name = "name,age,timestamp"
  }

  # You can also use other input plugins, such as hdfs
  # hdfs {
  #   result_table_name = "accesslog"
  #   path = "hdfs://hadoop-cluster-01/nginx/accesslog"
  #   format = "json"
  # }

  # If you would like to get more information about how to configure seatunnel and see full list of input plugins,
  # please go to https://seatunnel.apache.org/docs/spark/configuration/source-plugins/Fake
}

transform {
  # split data by specific delimiter

  # you can also use other transform plugins, such as sql
  sql {
    sql = "select name,age from fake"
    result_table_name = "sql"
  }

  # If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
  # please go to https://seatunnel.apache.org/docs/spark/configuration/transform-plugins/Split
}

sink {
  # choose stdout output plugin to output data to console
  Console {}

  # you can also you other output plugins, such as sql
  # hdfs {
  #   path = "hdfs://hadoop-cluster-01/nginx/accesslog_processed"
  #   save_mode = "append"
  # }

  # If you would like to get more information about how to configure seatunnel and see full list of output plugins,
  # please go to https://seatunnel.apache.org/docs/spark/configuration/sink-plugins/Console
}

Running Command

run

Error Exception

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.spark.unsafe.types.UTF8String
	at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getUTF8String(rows.scala:46)
	at org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.getUTF8String(SpecificInternalRow.scala:193)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:117)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$$anonfun$run$3.apply(WriteToDataSourceV2Exec.scala:116)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
	at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:146)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:67)
	at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec$$anonfun$doExecute$2.apply(WriteToDataSourceV2Exec.scala:66)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@2013650523 2013650523 added the bug label Jun 27, 2022
@Hisoka-X
Copy link
Member

Hi, this error will fix by #2051 in InternalRowConverter.java

@2013650523
Copy link
Contributor Author

tks, I worked your way out of the problem, May I ask the new connector sink Hive, etc., are these configuration files still written as before?

@Hisoka-X
Copy link
Member

tks, I worked your way out of the problem, May I ask the new connector sink Hive, etc., are these configuration files still written as before?

Yep. The developer should try to keep the same as the previous configuration as much as possible

@CalvinKirs
Copy link
Member

close by #2051

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants