Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caused by: java.lang.ClassNotFoundException: org.xerial.snappy.Snappy #68

Closed
Lipeng522 opened this issue Dec 16, 2020 · 8 comments · Fixed by #69
Closed

Caused by: java.lang.ClassNotFoundException: org.xerial.snappy.Snappy #68

Lipeng522 opened this issue Dec 16, 2020 · 8 comments · Fixed by #69
Assignees
Labels
bug Something isn't working

Comments

@Lipeng522
Copy link

Hi,author,when I use this datax to collect hdfs parquent+snappy files, I had a issue. Can you help me to find the reason,thank you.

The hive snappy settiing like:
set hive.intermediate.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

My Job.json like:

{
"job": {
"setting": {
"speed": {
"channel": 3
}
},
"content": [
{
"reader": {
"name": "hdfsreader",
"parameter": {
"path": "******",
"defaultFS": "hdfs://nameservice1",
"column": [
{
"index": 0,
"type": "string"
},
{
"index": 1,
"type": "string"
},
{
"index": 2,
"type": "string"
},
{
"index": 3,
"type": "string"
},
{
"index": 4,
"type": "string"
},
{
"index": 5,
"type": "string"
},
{
"index": 6,
"type": "string"
},
{
"index": 7,
"type": "string"
},
{
"index": 8,
"type": "string"
},
{
"index": 9,
"type": "string"
},
{
"index": 10,
"type": "string"
},
{
"index": 11,
"type": "string"
},
{
"index": 12,
"type": "string"
},
{
"index": 13,
"type": "string"
},
{
"index": 14,
"type": "string"
},
{
"index": 15,
"type": "string"
},
{
"index": 16,
"type": "string"
},
{
"index": 17,
"type": "string"
},
{
"index": 18,
"type": "string"
},
{
"index": 19,
"type": "string"
},
{
"index": 20,
"type": "string"
},
{
"index": 21,
"type": "string"
},
{
"index": 22,
"type": "string"
},
{
"index": 23,
"type": "string"
},
{
"index": 24,
"type": "string"
},
{
"index": 25,
"type": "string"
},
{
"index": 26,
"type": "string"
},
{
"index": 27,
"type": "string"
}
],
"fileType": "parquet",
"encoding": "UTF-8",
"fieldDelimiter": ",",
"compress":"hadoop-snappy"
}

            },
             "writer": {
      "name": "elasticsearchwriter",
      "parameter": {
        "endpoint": "http://********:9200",
        "index": "supp_yunc_recpt_fct_test",
        "type": "type1",
        "cleanup": false,
        "settings": {"index" :{"number_of_shards": 2, "number_of_replicas": 1}},
        "discovery": false,
        "batchSize": 1000,
        "splitter": ",",
        "column": [
          {"name": "id", "type": "id"},
          { "name": "pur_doc_id","type": "keyword" },
          { "name": "goodsid","type": "keyword" },
          { "name": "appt_id","type": "keyword" },
          { "name": "compt_id","type": "keyword" },
          { "name": "appt_sts","type": "keyword" },
          { "name": "del_no","type": "keyword" },
          { "name": "is_gift","type": "keyword" },
          { "name": "qty","type": "keyword" },
          { "name": "qty_appt","type": "keyword" },
          { "name": "qty_proce","type": "keyword" },
          { "name": "qty_proce_base","type": "keyword" },
          { "name": "qty_compt","type": "keyword" },
          { "name": "appt_time","type": "keyword" },
          { "name": "price","type": "keyword" },
          { "name": "price_no_tax","type": "keyword" },
          { "name": "sap_del_no","type": "keyword" },
          { "name": "sap_rownum","type": "keyword" },
          { "name": "act_time","type": "keyword" },
          { "name": "remark","type": "keyword" },
          { "name": "create_time","type": "keyword" },
          { "name": "creator","type": "keyword" },
          { "name": "updated_time","type": "keyword" },
          { "name": "updated_by","type": "keyword" },
          { "name": "last_updated_time","type": "keyword" },
          { "name": "insert_time","type": "keyword" },
          { "name": "sdt","type": "keyword" }
        ]
      }
    }
  }
]

}
}

the issue like:
Exception in thread "job-0" java.lang.NoClassDefFoundError: org/xerial/snappy/Snappy
at org.apache.parquet.hadoop.codec.SnappyDecompressor.decompress(SnappyDecompressor.java:62)
at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:51)
at java.io.DataInputStream.readFully(DataInputStream.java:195)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:279)
at org.apache.parquet.bytes.BytesInput.toByteBuffer(BytesInput.java:230)
at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.(PlainValuesDictionary.java:91)
at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainBinaryDictionary.(PlainValuesDictionary.java:74)
at org.apache.parquet.column.Encoding$1.initDictionary(Encoding.java:88)
at org.apache.parquet.column.Encoding$4.initDictionary(Encoding.java:147)
at org.apache.parquet.column.impl.ColumnReaderBase.(ColumnReaderBase.java:383)
at org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:46)
at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
at org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:136)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.isParquetFile(DFSUtil.java:893)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.checkHdfsFileType(DFSUtil.java:741)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.addSourceFileByType(DFSUtil.java:222)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.addSourceFileIfNotEmpty(DFSUtil.java:152)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getHDFSAllFilesNORegex(DFSUtil.java:209)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getHDFSAllFiles(DFSUtil.java:179)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getAllFiles(DFSUtil.java:141)
at com.alibaba.datax.plugin.reader.hdfsreader.HdfsReader$Job.prepare(HdfsReader.java:172)
at com.alibaba.datax.core.job.JobContainer.prepareJobReader(JobContainer.java:702)
at com.alibaba.datax.core.job.JobContainer.prepare(JobContainer.java:312)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:115)
at com.alibaba.datax.core.Engine.start(Engine.java:90)
at com.alibaba.datax.core.Engine.entry(Engine.java:151)
at com.alibaba.datax.core.Engine.main(Engine.java:169)
Caused by: java.lang.ClassNotFoundException: org.xerial.snappy.Snappy
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 36 more

@Lipeng522 Lipeng522 added the bug Something isn't working label Dec 16, 2020
@wgzhao
Copy link
Owner

wgzhao commented Dec 16, 2020

Thanks for the feedback, and yes, this is a known bug that I have fixed on the master branch.
The problem occurred because the hdfsreader module's pom.xml file did not introduce the snappy dependency.

You can copy plugin/writer/hdfswriter/libs/snappy-java-1.1.7.3.jar to plugin/reader/hdfsreader/libs directory, and then re-run your job, you should not get the above error.

Thanks again for your feedback!

@Lipeng522
Copy link
Author

Thank you for you help, this bug is solved, but when I write into Es, there occured anothor problem
I need you help, Looking forward to your reply.

My Es.version is 6.7.1
The issue like,

2020-12-16 12:35:06.672 [job-0] INFO JobContainer - PerfTrace not enable!
Exception in thread "job-0" java.lang.NoClassDefFoundError: io/searchbox/client/config/ElasticsearchVersion
at io.searchbox.client.config.HttpClientConfig$Builder.(HttpClientConfig.java:125)
at com.alibaba.datax.plugin.writer.elasticsearchwriter.ESClient.createClient(ESClient.java:57)
at com.alibaba.datax.plugin.writer.elasticsearchwriter.ESWriter$Job.prepare(ESWriter.java:57)
at com.alibaba.datax.core.job.JobContainer.prepareJobWriter(JobContainer.java:711)
at com.alibaba.datax.core.job.JobContainer.prepare(JobContainer.java:313)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:115)
at com.alibaba.datax.core.Engine.start(Engine.java:90)
at com.alibaba.datax.core.Engine.entry(Engine.java:151)
at com.alibaba.datax.core.Engine.main(Engine.java:169)
Caused by: java.lang.ClassNotFoundException: io.searchbox.client.config.ElasticsearchVersion
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 9 more

@wgzhao
Copy link
Owner

wgzhao commented Dec 16, 2020

I have reproduced your problem on my machine and I will fix it quickly

@wgzhao wgzhao linked a pull request Dec 16, 2020 that will close this issue
@wgzhao
Copy link
Owner

wgzhao commented Dec 16, 2020

I have fixed the problem
You can replace the plugin/writer/elasticsearch folder on your machine with the zip file , then test it.
It should solve your problem

Thanks for your feedback!

@wgzhao
Copy link
Owner

wgzhao commented Dec 16, 2020

has sent

@Lipeng522
Copy link
Author

Thank you, new issue occured,Like

Exception in thread "job-0" java.lang.NoSuchMethodError: com.google.gson.JsonObject.size()I
at io.searchbox.indices.CreateIndex.getData(CreateIndex.java:63)
at io.searchbox.client.http.JestHttpClient.prepareRequest(JestHttpClient.java:119)
at io.searchbox.client.http.JestHttpClient.execute(JestHttpClient.java:67)
at io.searchbox.client.http.JestHttpClient.execute(JestHttpClient.java:63)
at com.alibaba.datax.plugin.writer.elasticsearchwriter.ESClient.createIndex(ESClient.java:119)
at com.alibaba.datax.plugin.writer.elasticsearchwriter.ESWriter$Job.prepare(ESWriter.java:80)
at com.alibaba.datax.core.job.JobContainer.prepareJobWriter(JobContainer.java:711)
at com.alibaba.datax.core.job.JobContainer.prepare(JobContainer.java:313)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:115)
at com.alibaba.datax.core.Engine.start(Engine.java:90)
at com.alibaba.datax.core.Engine.entry(Engine.java:151)
at com.alibaba.datax.core.Engine.main(Engine.java:169)

Maybe you can upload your new version code into github, and I will download your new tar.gz file
Thanks for your sharing very much.

@wgzhao
Copy link
Owner

wgzhao commented Dec 16, 2020

Take look libs/gson jar version, you can update to 2.8.x .
I meet the same problem, replace it using higher version can resolve it.

@Lipeng522
Copy link
Author

OK, the app can working now, thank you very much , now ,there too much dirty data, so all data trans failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants