Hadoop commands

HDFS Reconfigure without restart https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html configurable keys are limited without restart $ hdfs dfsadmin -reconfig namenode nn1.example.com:8020 properties Node [nn1.example.com:8020] Reconfigurable properties: dfs.block.placement.ec.classname dfs.block.replicator.classname dfs.heartbeat.interval dfs.image.parallel.load dfs.namenode.avoid.read.slow.datanode dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled dfs.namenode.heartbeat.recheck-interval dfs.namenode.max.slowpeer.collect.nodes dfs.namenode.replication.max-streams dfs.namenode.replication.max-streams-hard-limit dfs.namenode.replication.work.multiplier.per.iteration dfs.storage.policy.satisfier.mode fs.protected.directories hadoop.caller.context.enabled ipc.8020.backoff.enable

February 5, 2023

About "HADOOP_CLASSPATH" environment variable

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/UnixShellGuide.html#HADOOP_CLASSPATH In Hadoop ecosystem, HADOOP_CLASSPATH environment variable is commonly used in many places. Hive is use this variable, too. I wonder that how the HADOOP_CLASSPATH variable is used in a script like beeline. I cannot find HADOOP_CLASSPATH variable in Hive source codes. I finally figure out that when executing beeline it uses hadoop jar command. (https://github.com/apache/hive/blob/rel/release-3.1.3/bin/ext/beeline.sh#L35) It uses RunJar.java where HADOOP_CLASSPATH is used to set CLASSPATH. (https://github.com/apache/hadoop/blob/rel/release-3.3.4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/RunJar.java#L347-L351) If something in Hadoop ecosystem uses RunJar#main, it probably repect HADOOP_CLASSPATH environment variable. ...

February 5, 2023