EUB's second brain

Proxy repository for cargo/rust

Sonatype Nexus community edition 사용 예 3.77 이상의 버전을 써야 ce 버전에서도 cargo repository 를 쓸 수 있다. 단, 3.77 버전부터는 사용량 hard limit 이 존재하게 된다. 다음 명령으로 간단히 로컬에 구동 docker run -d -p 8081:8081 --name nexus sonatype/nexus3 cargo-proxy 로 repository 를 만들었다고 가정, remote url 은 https://index.crates.io 로 한다. .cargo/config.toml 에 다음과 같이 내용 추가. ❯ cat .cargo/config.toml [registries.nexus] index = "sparse+http://localhost:8081/repository/cargo-proxy/" [registry] default = "nexus" [source.crates-io] replace-with = "nexus" [source.nexus] registry = "sparse+http://localhost:8081/repository/cargo-proxy/" https://index.crates.io/ 에 써있는 설명을 보니 sparse 프로토콜을 쓰는 것 같다. 위에 config.toml 에서도 sparse+ 를 빼먹으면 정상동작하지 않는다. ...

Don't sync privileges from ranger to hive

When Apache Ranger is configured for authorization on Secure Hadoop cluster, Hive below 4.x.x synchronizes all the ranger hive policies to rdbms for Hive as default. It is unnecessary and make unnecessary high load on db. See https://issues.apache.org/jira/browse/HIVE-25391. You can disable it with the configurations on HiveServer2 as follows. hive.privilege.synchronizer=false

Monitoring metrics related to "jute.maxbuffer"

There is a configuration named as “jute.maxbuffer” when using zookeeper. This can be set on zookeeper client side or server side. On zookeeper client side, the setting should be lower than that on zookeeper server. If a client gets data bigger than the setting, it will get an error. There are some related issue. https://issues.apache.org/jira/browse/HIVE-21993 https://issues.apache.org/jira/browse/YARN-2962 In order to avoid this errors. Some metrics should be monitored on zookeeper. last_client_response_size or max_client_response_size client_response_size is a response size in bytes from zookeeper server to client. last_proposal_size or max_proposal_size proposal_size is a proposal size in bytes from zookeeper server leader to follower. proposal?: refer to https://zookeeper.apache.org/doc/r3.7.1/zookeeperInternals.html. These values should be lower than jute.maxbuffer. This setting can be set through jvm argument like -Djute.maxbuffer=10485760 (10mb).

Checklist for hive metastore when using mysql

MySQL Index There are some expensive operations for hive metastore when accessing or storing metadatas on RDBMS. Here are some official hive patches for indexing. -- HIVE-21063 CREATE UNIQUE INDEX `NOTIFICATION_LOG_EVENT_ID` ON NOTIFICATION_LOG (`EVENT_ID`) USING BTREE; -- HIVE-21487 CREATE INDEX COMPLETED_COMPACTIONS_RES ON COMPLETED_COMPACTIONS (CC_DATABASE,CC_TABLE,CC_PARTITION); -- HIVE-27165 DROP INDEX TAB_COL_STATS_IDX ON TAB_COL_STATS; CREATE INDEX TAB_COL_STATS_IDX ON TAB_COL_STATS (DB_NAME, TABLE_NAME, COLUMN_NAME, CAT_NAME) USING BTREE; DROP INDEX PCS_STATS_IDX ON PART_COL_STATS; CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME,CAT_NAME) USING BTREE; When you upgrade your hive, there are some changes on tables in rdbms. You can find needed SQLs depending on version at https://github.com/apache/hive/tree/master/standalone-metastore/metastore-server/src/main/sql/mysql. ...

Some cases where "rdns = false" in krb5.conf does not work in Hadoop ecosystem

https://web.mit.edu/kerberos/krb5-1.13/doc/admin/princ_dns.html Operating system bugs may prevent a setting of rdns = false from disabling reverse DNS lookup. Some versions of GNU libc have a bug in getaddrinfo() that cause them to look up PTR records even when not required. MIT Kerberos releases krb5-1.10.2 and newer have a workaround for this problem, as does the krb5-1.9.x series as of release krb5-1.9.4. There are some cases where “rdns = false” in krb5.conf is not respected in Hadoop ecosystem. You can try to modify /etc/hosts or register PTR records to fix this kind of issues. ...

Hadoop commands

HDFS Reconfigure without restart https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html configurable keys are limited without restart $ hdfs dfsadmin -reconfig namenode nn1.example.com:8020 properties Node [nn1.example.com:8020] Reconfigurable properties: dfs.block.placement.ec.classname dfs.block.replicator.classname dfs.heartbeat.interval dfs.image.parallel.load dfs.namenode.avoid.read.slow.datanode dfs.namenode.block-placement-policy.exclude-slow-nodes.enabled dfs.namenode.heartbeat.recheck-interval dfs.namenode.max.slowpeer.collect.nodes dfs.namenode.replication.max-streams dfs.namenode.replication.max-streams-hard-limit dfs.namenode.replication.work.multiplier.per.iteration dfs.storage.policy.satisfier.mode fs.protected.directories hadoop.caller.context.enabled ipc.8020.backoff.enable

About "HADOOP_CLASSPATH" environment variable

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/UnixShellGuide.html#HADOOP_CLASSPATH In Hadoop ecosystem, HADOOP_CLASSPATH environment variable is commonly used in many places. Hive is use this variable, too. I wonder that how the HADOOP_CLASSPATH variable is used in a script like beeline. I cannot find HADOOP_CLASSPATH variable in Hive source codes. I finally figure out that when executing beeline it uses hadoop jar command. (https://github.com/apache/hive/blob/rel/release-3.1.3/bin/ext/beeline.sh#L35) It uses RunJar.java where HADOOP_CLASSPATH is used to set CLASSPATH. (https://github.com/apache/hadoop/blob/rel/release-3.3.4/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/RunJar.java#L347-L351) If something in Hadoop ecosystem uses RunJar#main, it probably repect HADOOP_CLASSPATH environment variable.

SPNEGO-enabled Hadoop DataNode misjudges kerberos "replay attack".

references https://docs.cloudera.com/cloudera-manager/7.5.5/security-troubleshooting/cm-security-troubleshooting.pdf https://search-guard.com/elasticsearch-kibana-kerberos/ I guess that this is caused by sharing the same kerberos keytab (/etc/security/keytabs/spnego.service.keytab) and principal(HTTP/_HOST@{REALM}) among Hadoop daemons (NameNode, DataNode, JournalNodes, ResourceManager, NodeManager …). I assume that DataNode misjudges it is a replay attack in certain circumstances. Adding the following jvm system properties to Hadoop daemons will fix this issue as a workaround. It means java process will not use replay cache. -Dsun.security.krb5.rcache=none