Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [seatunnel-engine-storage] map and checkpoint Writing HDFS Kerberos tickets with automatic 24-hour expiration #7102

Open
2 of 3 tasks
weipengfei-sj opened this issue Jul 4, 2024 · 8 comments · May be fixed by #7995
Labels

Comments

@weipengfei-sj
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

采用seatunnel2.3.5版本,3个节点的集群模式
hazelcast.yaml 配置如下:
map:
engine*:
map-store:
enabled: true
initial-mode: EAGER
factory-class-name: org.apache.seatunnel.engine.server.persistence.FileMapStoreFactory
properties:
type: hdfs
namespace: /tmp/seatunnel/imap
clusterName: seatunnel-cluster
storage.type: hdfs
fs.defaultFS: hdfs://fss:8020
kerberosPrincipal: hdfs
kerberosKeytabFilePath: /applinkis/ceph/share/hadoopcluster/fss/keytab/hdfs.keytab
krb5Path: /app/linkis/seatunnel/config/krb5.conf
seatunnel.hadoop.dfs.nameservices: fss
seatunnel.hadoop.dfs.ha.namenodes.fss: nn1,nn2
seatunnel.hadoop.dfs.namenode.rpc-address.fss.nn1: nn1:8020
seatunnel.hadoop.dfs.namenode.rpc-address.fss.nn2: nn2:8020
seatunnel.hadoop.dfs.client.failover.proxy.provider.usdp-bing: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
seatunnel.hadoop.dfs.namenode.kerberos.principal: nn/_HOST@T1.COM
seatunnel.hadoop.dfs.datanode.kerberos.principal: dn/_HOST@T1.COM
seatunnel.hadoop.rpc.protection: authentication
seatunnel.hadoop.security.authentication: kerberos
hdfs_site_path: /applinkis/ceph/share/hadoopcluster/fss/hadoop/hdfs-site.xml
配置map信息写入到hdfs上,当集群运行超过24h之后,观察服务日志,发现写hdfs存在kerberos票据过期问题

分析源码如下:
如果采用该方式认证hdfs写入hdfs,不自动刷新票据的逻辑话,必然存在票据过期的问题出现
image

尝试修改代码,增加认证后,启动定时任务自动刷新机制:
image
image

但是增加上述自动刷新kerberos票据机制之后,24h后,服务写hdfs仍然报存在票据不可用的问题
另外尝试了多个地方,比如在HdfsWriter类中也增加了票据自动刷新机制,但是均不生效,请社区的大佬帮忙指正一下,非常感谢

SeaTunnel Version

2.3.5

SeaTunnel Config

hazelcast.yaml 配置如下:
  map:
    engine*:
       map-store:
         enabled: true
         initial-mode: EAGER
         factory-class-name: org.apache.seatunnel.engine.server.persistence.FileMapStoreFactory
         properties:
           type: hdfs
           namespace: /tmp/seatunnel/imap
           clusterName: seatunnel-cluster
           storage.type: hdfs
           fs.defaultFS: hdfs://fss:8020
           kerberosPrincipal: hdfs
           kerberosKeytabFilePath: /applinkis/ceph/share/hadoopcluster/fss/keytab/hdfs.keytab 
           krb5Path: /app/linkis/seatunnel/config/krb5.conf
           seatunnel.hadoop.dfs.nameservices: fss
           seatunnel.hadoop.dfs.ha.namenodes.fss: nn1,nn2
           seatunnel.hadoop.dfs.namenode.rpc-address.fss.nn1: nn1:8020
           seatunnel.hadoop.dfs.namenode.rpc-address.fss.nn2: nn2:8020
           seatunnel.hadoop.dfs.client.failover.proxy.provider.usdp-bing: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
           seatunnel.hadoop.dfs.namenode.kerberos.principal: nn/_HOST@T1.COM
           seatunnel.hadoop.dfs.datanode.kerberos.principal: dn/_HOST@T1.COM
           seatunnel.hadoop.rpc.protection: authentication
           seatunnel.hadoop.security.authentication: kerberos
           hdfs_site_path: /applinkis/ceph/share/hadoopcluster/fss/hadoop/hdfs-site.xml

Running Command

./bin/seatunnel.sh  -c config/test-source-kerberos-kafka.yaml

Error Exception

2024-07-03 15:12:50,607 WARN  [o.a.h.i.Client                ] [LeaseRenewer:hdfs@fsst1] - Exception encountered while connecting to the server
javax.security.sasl.SaslException: GSS initiate failed
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) ~[?:1.8.0_181]
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:408) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:622) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection.access$2300(Client.java:413) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:822) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:818) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_181]
        at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_181]
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:818) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:413) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
/seatunnel-starter.jar
        at com.sun.proxy.$Proxy34.fsync(Unknown Source) ~[?:?]
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.fsync(ClientNamenodeProtocolTranslatorPB.java:984) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at sun.reflect.GeneratedMethodAccessor107.invoke(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_181]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_181]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at com.sun.proxy.$Proxy35.fsync(Unknown Source) ~[?:?]
        at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:706) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.hdfs.DFSOutputStream.hsync(DFSOutputStream.java:604) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.hadoop.hdfs.client.HdfsDataOutputStream.hsync(HdfsDataOutputStream.java:96) ~[seatunnel-hadoop3-3.1.4-uber.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.wal.writer.HdfsWriter.flush(HdfsWriter.java:87) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.wal.writer.HdfsWriter.write(HdfsWriter.java:101) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.wal.writer.HdfsWriter.write(HdfsWriter.java:80) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.wal.writer.HdfsWriter.write(HdfsWriter.java:44) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.common.WALWriter.write(WALWriter.java:50) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.disruptor.WALWorkHandler.walEvent(WALWorkHandler.java:87) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.disruptor.WALWorkHandler.onEvent(WALWorkHandler.java:78) ~[seatunnel-starter.jar:2.3.5]
        at org.apache.seatunnel.engine.imap.storage.file.disruptor.WALWorkHandler.onEvent(WALWorkHandler.java:44) ~[seatunnel-starter.jar:2.3.5]
        at com.lmax.disruptor.WorkProcessor.run(WorkProcessor.java:143) ~[seatunnel-starter.jar:2.3.5]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
        at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147) ~[?:1.8.0_181]
        at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:122) ~[?:1.8.0_181]
        at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187) ~[?:1.8.0_181]
        at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:224) ~[?:1.8.0_181]
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212) ~[?:1.8.0_181]
        at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179) ~[?:1.8.0_181]
        at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:192) ~[?:1.8.0_181]
        ... 39 more

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@liunaijie
Copy link
Member

hi, base your issue title and description, the exception is happend on write IMAP or checkpoint right, not on the data sync process, right?

i guess is the FileSystem client not refresh, so even you refresh the config, the client still use old config then got issue.
you can try to refresh the client to slove this issue.
image

@weipengfei-sj
Copy link
Author

image
我尝试在客户端增加了定时刷新,24h后,仍然是报kerberos认证的错误

@weipengfei-sj
Copy link
Author

@liunaijie 请帮忙看看呢

@liunaijie
Copy link
Member

@liunaijie 请帮忙看看呢

hi, please attach all the code you change, or give the repo link.

@shenzhy5
Copy link

i met the same problem

@shenzhy5
Copy link

@liunaijie refresh FileSystem cannot solve this problem

@liunaijie
Copy link
Member

liunaijie commented Nov 7, 2024

This is an issue, I will take a look.

@liunaijie liunaijie linked a pull request Nov 7, 2024 that will close this issue
4 tasks
@liunaijie
Copy link
Member

@shenzhy5 @weipengfei-sj I create a pr to fix this. can you help cherry-pick this commit and verify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants