I guess that this is caused by sharing the same kerberos keytab (/etc/security/keytabs/spnego.service.keytab) and principal(HTTP/_HOST@{REALM}) among Hadoop daemons (NameNode, DataNode, JournalNodes, ResourceManager, NodeManager …). I assume that DataNode misjudges it is a replay attack in certain circumstances.

Adding the following jvm system properties to Hadoop daemons will fix this issue as a workaround. It means java process will not use replay cache.

-Dsun.security.krb5.rcache=none