You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
....
2020-12-11 21:27:24.903 [job-0] INFO DFSUtil - get HDFS all files in path = [/tmp/out_orc]
2020-12-11 21:27:26.459 [job-0] ERROR DFSUtil - 检查文件[hdfs://sandbox-hdp.hortonworks.com:8020/tmp/out_orc/test_none__48bb0c2c_c520_4406_ab12_8039dc277296]类型失败,目前支持ORC,SEQUENCE,RCFile,TEXT,CSV五种格式的文件,请检查您文件类型和文件是否正确。
2020-12-11 21:27:26.472 [job-0] INFO StandAloneJobContainerCommunicator - Total 0 records, 0 bytes | Speed 0B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 0.00%
2020-12-11 21:27:26.474 [job-0] ERROR Engine - Code:[HdfsReader-10], Description:[读取文件出错]. - Code:[HdfsReader-10], Description:[读取文件出错]. - 检查文件[hdfs://sandbox-hdp.hortonworks.com:8020/tmp/out_orc/test_none__48bb0c2c_c520_4406_ab12_8039dc277296]类型失败,目前支持ORC,SEQUENCE,RCFile,TEXT,CSV五种格式的文件,请检查您文件类型和文件是否正确。 - java.lang.RuntimeException: hdfs://sandbox-hdp.hortonworks.com:8020/tmp/out_orc/test_none__48bb0c2c_c520_4406_ab12_8039dc277296 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [101, 115, 116, 10]
at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:531)
at org.apache.parquet.hadoop.ParquetFileReader.<init>(ParquetFileReader.java:712)
at org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:609)
at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:152)
at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.isParquetFile(DFSUtil.java:893)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.checkHdfsFileType(DFSUtil.java:724)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.addSourceFileByType(DFSUtil.java:222)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.addSourceFileIfNotEmpty(DFSUtil.java:152)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getHDFSAllFilesNORegex(DFSUtil.java:209)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getHDFSAllFiles(DFSUtil.java:179)
at com.alibaba.datax.plugin.reader.hdfsreader.DFSUtil.getAllFiles(DFSUtil.java:141)
at com.alibaba.datax.plugin.reader.hdfsreader.HdfsReader$Job.prepare(HdfsReader.java:172)
at com.alibaba.datax.core.job.JobContainer.prepareJobReader(JobContainer.java:702)
at com.alibaba.datax.core.job.JobContainer.prepare(JobContainer.java:312)
at com.alibaba.datax.core.job.JobContainer.start(JobContainer.java:115)
at com.alibaba.datax.core.Engine.start(Engine.java:90)
at com.alibaba.datax.core.Engine.entry(Engine.java:151)
at com.alibaba.datax.core.Engine.main(Engine.java:169)
运行环境
OS: CentOS 7.7.1908
JDK Version: openjdk 14
DataX Version: 3.1.4
The text was updated successfully, but these errors were encountered:
Describe the bug
hdfsreader 插件读取text文件时报错
运行的json文件如下:
执行结果如下:
运行环境
The text was updated successfully, but these errors were encountered: