部署Hadoop集群
目前对大数据挺感兴趣的,后面想要深入研究一下,故再次先尝试搭建一下Hadoop集群
环境介绍
主机名 |
IP |
角色 |
hadoop1 |
10.152.0.14 |
Namenode,Secondary Namenode,ResourceManager |
hadoop2 |
10.152.0.2 |
Datanode,NodeManager |
hadoop3 |
10.152.0.7 |
Datanode,NodeManager |
版本限制


节点基本设置
更新软件包
1
| apt update -y && apt upgrade -y
|
关闭交换分区
同步时间
并在crontab中添加定时同步
1
| */10 * * * * ntpdate time.nist.gov
|
修改主机名
1 2 3 4 5 6 7
| $ vim /etc/cloud/cloud.cfg # 把`preserve_hostname: false`改为`preserve_hostname: true` preserve_hostname: true
$ echo "hadoop1" > /etc/hostname # On hadoop2 && hadoop3, it would be 2,3 # 然后重启
|
添加hosts
1 2 3 4 5 6
| echo "10.152.0.14 hadoop1 10.152.0.2 hadoop2 10.152.0.7 hadoop3 10.152.0.14 zookeeper1 10.152.0.2 zookeeper2 10.152.0.7 zookeeper3" | sudo tee --append /etc/hosts
|
创建hadoop用户
添加sudo权限
1 2
| $ vim /etc/sudoers hadoop ALL=(ALL:ALL) NOPASSWD: ALL
|
切换到hadoop用户
1 2
| sudo su hadoop cd /home/hadoop
|
设置多机免密登录
生成密钥对(hadoop用户)
把公钥填入到已授权密钥列表
1 2
| cd ~/.ssh vim authorized_keys
|
一顿操作猛如虎······
后面具体操作不再介绍
安装JAVA
1
| apt install openjdk-8-jdk-headless openjdk-8-jre -y
|
Zookeeper
下载zookeeper
1 2 3 4 5 6
| # 官网下载页:https://zookeeper.apache.org/releases.html wget https://wood-bucket.oss-cn-beijing.aliyuncs.com/Linux/apache-zookeeper-3.6.1-bin.tar.gz tar -xzvf apache-zookeeper-3.6.1-bin.tar.gz sudo mv apache-zookeeper-3.6.1-bin /usr/local/zookeeper sudo chown -R hadoop:hadoop /usr/local/zookeeper cd /usr/local/zookeeper
|
创建zookeeper数据文件夹
1 2 3 4 5
| sudo mkdir -p /data/zookeeper sudo chown -R hadoop:hadoop /data/ cd /data/zookeeper echo "export ZOOKEEPER_HOME=/usr/local/zookeeper" >> /etc/profile source /etc/profile
|
写入节点id
1 2
| echo "1" > /data/zookeeper/myid # On zookeeper2 && zookeeper3, it would be 2,3
|
修改zookeeper配置文件
1 2
| cd /usr/local/zookeeper/conf cp zoo_sample.cfg zoo.cfg
|
1 2 3 4 5 6 7 8 9
| $ vim zoo.cfg #修改数据目录 dataDir=/data/zookeeper #允许使用所有命令 4lw.commands.whitelist=* #在最下面添加 server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888
|
通过systemctl管理zookeeper
sudo vim /etc/systemd/system/zookeeper.service
1 2 3 4 5 6 7 8 9
| [Unit] Description=zookeeper [Service] User=hadoop WorkingDirectory=/usr/local/zookeeper/ ExecStart=/usr/local/zookeeper/bin/zkServer.sh start-foreground ExecStop=/usr/local/zookeeper/bin/zkServer.sh stop [Install] WantedBy=default.target
|
1 2 3 4
| sudo systemctl daemon-reload sudo systemctl enable zookeeper sudo systemctl start zookeeper sudo systemctl status zookeeper
|
测试zookeeper是否正常运行
Hadoop
下载Hadoop
1 2 3 4 5
| cd /home/hadoop wget https://wood-bucket.oss-cn-beijing.aliyuncs.com/Linux/hadoop-3.1.3.tar.gz tar -xzvf hadoop-3.1.3.tar.gz sudo mv hadoop-3.1.3 /usr/local/hadoop sudo chown hadoop:hadoop /usr/local/hadoop
|
添加环境变量
vim ~/.bashrc
在最下面加入
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ export PATH=$PATH:$JAVA_HOME/bin export HBASE_HOME=/usr/local/hbase export PATH=$PATH:$HBASE_HOME/bin
export HADOOP_HOME=/usr/local/hadoop export HADOOP_PREFIX=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin export HADOOP_INSTALL=$HADOOP_HOME
|
使新设置的环境变量立即生效
修改Hadoop配置文件
修改hadoop-env.sh
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
1 2 3 4 5 6 7
| #在最上面添加 export HADOOP_NAMENODE_OPTS=" -Xms1024m -Xmx1024m -XX:+UseParallelGC" export HADOOP_DATANODE_OPTS=" -Xms1024m -Xmx1024m" export HADOOP_LOG_DIR=/data/logs/hadoop
#找到以下行,设置JAVA_HOME export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
|
修改core-site.xml文件
vim /usr/local/hadoop/etc/hadoop/core-site.xml
1 2 3 4
| <property> <name>fs.defaultFS</name> <value>hdfs://hadoop1:8020</value> </property>
|
备注: 直接用vi编辑器复制进去会出现对齐问题,需手动调整;也可以用tee –append来写入(下同)
示例
1 2 3 4
| echo "<property> <name>fs.defaultFS</name> <value>hdfs://hadoop1:8020</value> </property>" | sudo tee --append /usr/local/hadoop/etc/hadoop/core-site.xml
|
修改hdfs-site.xml文件
vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| <!-- 设置namenode存放的路径 --> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/hadoop/hdfs/nn</value> </property> <!-- 设置hdfs副本数量 --> <property> <name>dfs.replication</name> <value>2</value> </property> <!-- 设置datanode存放的路径 --> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/hadoop/hdfs/dn</value> </property>
|
修改yarn-env.sh
创建hadoop日志文件夹
1
| mkdir -p /data/logs/hadoop
|
创建数据文件夹
1 2
| mkdir -p /data/hadoop/hdfs/nn mkdir -p /data/hadoop/hdfs/dn
|
配置工作节点
1 2 3 4 5
| $ echo "hadoop2 hadoop3" > /usr/local/hadoop/etc/hadoop/slaves
$ echo "hadoop2 hadoop3" > /usr/local/hadoop/etc/hadoop/workers
|
下载补充插件
1 2 3
| # 貌似是jdk1.8以上运行此环境会报错提示少一个组件,需要下载一下 cd ${HADOOP_HOME}/share/hadoop/yarn/lib wget https://repo1.maven.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1.jar
|
准备启动Hadoop
格式化Namenode
(只在hadoop1上操作)
启动HDFS
(只在hadoop1上操作)
1 2
| cd ${HADOOP_HOME}/sbin ./start-dfs.sh
|
启动Yarn
(只在hadoop1上操作)
1 2
| cd ${HADOOP_HOME}/sbin ./start-yarn.sh
|
使用jps
命令查看启动的服务
如果在hadoop1上看到
1 2 3 4 5
| 22177 NameNode 21139 SecondaryNameNode 11213 QuorumPeerMain 23134 Jps 22734 ResourceManager
|
在hadoop2和hadoop3上看到
1 2 3 4
| 10880 QuorumPeerMain 19970 NodeManager 19142 DataNode 20268 Jps
|
说明没什么问题
如果启动有问题,可以在/data/logs/hadoop
文件夹下面看服务对应的日志
Hadoop Web控制台
Namenode
http://192.168.199.174:9870/dfshealth.html#tab-datanode
Datanode
http://192.168.199.171:9864/datanode.html#tab-overview
Cluster
http://192.168.199.174:8088/cluster
Hbase
下载设置Hbase
下载hbase
1 2 3 4
| wget https://wood-bucket.oss-cn-beijing.aliyuncs.com/Linux/hbase-2.2.5-bin.tar.gz tar -xzvf hbase-2.2.5-bin.tar.gz sudo mv hbase-2.2.5 /usr/local/hbase sudo chown -R hadoop:hadoop /usr/local/hbase
|
设置环境变量
vim ~/.bashrc
1 2 3 4
| export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ export PATH=$PATH:$JAVA_HOME/bin export HBASE_HOME=/usr/local/hbase export PATH=$PATH:$HBASE_HOME/bin
|
使之生效
source ~/.bashrc
添加用户组
1 2
| groupadd supergroup usermod -a -G supergroup hadoop
|
设置日志文件夹
1 2
| mkdir /data/logs/hbase chown -R hadoop:hadoop /data/logs/hbase
|
设置Hbase配置文件
配置hbase-env.sh
vim /usr/local/hbase/conf/hbase-env.sh
1 2 3 4
| export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/ export HBASE_MANAGES_ZK=false export HBASE_LOG_DIR=/data/logs/hbase export HBASE_CLASSPATH=/usr/local/hadoop/etc/hadoop
|
配置hbase-site.xml
vim /usr/local/hbase/conf/hbase-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| <property> <name>hbase.rootdir</name> <value>hdfs://hadoop1:8020/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>zookeeper1,zookeeper2,zookeeper3</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property>
|
如果不设置hbase.unsafe.stream.capability.enforce
,master会起不来。
启动hbase
在hbase1上启动Master
1
| ./hbase-daemon.sh start master
|
在hbase2和3上启动Regionserver
1
| ./hbase-daemon.sh start regionserver
|
使用jps查看服务是否正常启动
Hbase控制台
Master
http://192.168.199.174:16010/master-status
Regionserver
http://192.168.199.171:16030/rs-status
参考文档
Hbase官方文档:http://hbase.apache.org/book.html#java