Whoosy's Blog

藏巧于拙 用晦而明 寓清于浊 以屈为伸

0%

部署Hadoop集群

部署Hadoop集群

目前对大数据挺感兴趣的,后面想要深入研究一下,故再次先尝试搭建一下Hadoop集群

环境介绍

主机名 IP 角色
hadoop1 10.152.0.14 Namenode,Secondary Namenode,ResourceManager
hadoop2 10.152.0.2 Datanode,NodeManager
hadoop3 10.152.0.7 Datanode,NodeManager

版本限制

sunshiwei

sunshiwei

节点基本设置

更新软件包

1
apt update -y && apt upgrade -y

关闭交换分区

1
swapoff -a

同步时间

1
ntpdate time.nist.gov

并在crontab中添加定时同步

1
*/10 * * * * ntpdate time.nist.gov

修改主机名

1
2
3
4
5
6
7
$ vim /etc/cloud/cloud.cfg
# 把`preserve_hostname: false`改为`preserve_hostname: true`
preserve_hostname: true

$ echo "hadoop1" > /etc/hostname
# On hadoop2 && hadoop3, it would be 2,3
# 然后重启

添加hosts

1
2
3
4
5
6
echo "10.152.0.14 hadoop1
10.152.0.2 hadoop2
10.152.0.7 hadoop3
10.152.0.14 zookeeper1
10.152.0.2 zookeeper2
10.152.0.7 zookeeper3" | sudo tee --append /etc/hosts

创建hadoop用户

1
adduser hadoop

添加sudo权限

1
2
$ vim /etc/sudoers
hadoop ALL=(ALL:ALL) NOPASSWD: ALL

切换到hadoop用户

1
2
sudo su hadoop
cd /home/hadoop

设置多机免密登录

生成密钥对(hadoop用户)

1
ssh-keygen

把公钥填入到已授权密钥列表

1
2
cd ~/.ssh
vim authorized_keys

一顿操作猛如虎······

后面具体操作不再介绍

安装JAVA

1
apt install openjdk-8-jdk-headless openjdk-8-jre -y

Zookeeper

下载zookeeper

1
2
3
4
5
6
# 官网下载页:https://zookeeper.apache.org/releases.html
wget https://wood-bucket.oss-cn-beijing.aliyuncs.com/Linux/apache-zookeeper-3.6.1-bin.tar.gz
tar -xzvf apache-zookeeper-3.6.1-bin.tar.gz
sudo mv apache-zookeeper-3.6.1-bin /usr/local/zookeeper
sudo chown -R hadoop:hadoop /usr/local/zookeeper
cd /usr/local/zookeeper

创建zookeeper数据文件夹

1
2
3
4
5
sudo mkdir -p /data/zookeeper
sudo chown -R hadoop:hadoop /data/
cd /data/zookeeper
echo "export ZOOKEEPER_HOME=/usr/local/zookeeper" >> /etc/profile
source /etc/profile

写入节点id

1
2
echo "1" > /data/zookeeper/myid
# On zookeeper2 && zookeeper3, it would be 2,3

修改zookeeper配置文件

1
2
cd /usr/local/zookeeper/conf
cp zoo_sample.cfg zoo.cfg
1
2
3
4
5
6
7
8
9
$ vim zoo.cfg
#修改数据目录
dataDir=/data/zookeeper
#允许使用所有命令
4lw.commands.whitelist=*
#在最下面添加
server.1=zookeeper1:2888:3888
server.2=zookeeper2:2888:3888
server.3=zookeeper3:2888:3888

通过systemctl管理zookeeper

sudo vim /etc/systemd/system/zookeeper.service

1
2
3
4
5
6
7
8
9
[Unit]
Description=zookeeper
[Service]
User=hadoop
WorkingDirectory=/usr/local/zookeeper/
ExecStart=/usr/local/zookeeper/bin/zkServer.sh start-foreground
ExecStop=/usr/local/zookeeper/bin/zkServer.sh stop
[Install]
WantedBy=default.target
1
2
3
4
sudo systemctl daemon-reload
sudo systemctl enable zookeeper
sudo systemctl start zookeeper
sudo systemctl status zookeeper

测试zookeeper是否正常运行

1
nc -vz localhost 2181

Hadoop

下载Hadoop

1
2
3
4
5
cd /home/hadoop
wget https://wood-bucket.oss-cn-beijing.aliyuncs.com/Linux/hadoop-3.1.3.tar.gz
tar -xzvf hadoop-3.1.3.tar.gz
sudo mv hadoop-3.1.3 /usr/local/hadoop
sudo chown hadoop:hadoop /usr/local/hadoop

添加环境变量

vim ~/.bashrc

在最下面加入

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME

使新设置的环境变量立即生效

1
source ~/.bashrc

修改Hadoop配置文件

修改hadoop-env.sh

vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh

1
2
3
4
5
6
7
#在最上面添加
export HADOOP_NAMENODE_OPTS=" -Xms1024m -Xmx1024m -XX:+UseParallelGC"
export HADOOP_DATANODE_OPTS=" -Xms1024m -Xmx1024m"
export HADOOP_LOG_DIR=/data/logs/hadoop

#找到以下行,设置JAVA_HOME
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/

修改core-site.xml文件

vim /usr/local/hadoop/etc/hadoop/core-site.xml

1
2
3
4
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:8020</value>
</property>

备注: 直接用vi编辑器复制进去会出现对齐问题,需手动调整;也可以用tee –append来写入(下同)

示例

1
2
3
4
echo "<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:8020</value>
</property>" | sudo tee --append /usr/local/hadoop/etc/hadoop/core-site.xml

修改hdfs-site.xml文件

vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<!-- 设置namenode存放的路径 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/nn</value>
</property>
<!-- 设置hdfs副本数量 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 设置datanode存放的路径 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/dn</value>
</property>

修改yarn-env.sh

1
nothing

创建hadoop日志文件夹

1
mkdir -p /data/logs/hadoop

创建数据文件夹

1
2
mkdir -p /data/hadoop/hdfs/nn
mkdir -p /data/hadoop/hdfs/dn

配置工作节点

1
2
3
4
5
$ echo "hadoop2
hadoop3" > /usr/local/hadoop/etc/hadoop/slaves

$ echo "hadoop2
hadoop3" > /usr/local/hadoop/etc/hadoop/workers

下载补充插件

1
2
3
# 貌似是jdk1.8以上运行此环境会报错提示少一个组件,需要下载一下
cd ${HADOOP_HOME}/share/hadoop/yarn/lib
wget https://repo1.maven.org/maven2/javax/activation/activation/1.1.1/activation-1.1.1.jar

准备启动Hadoop

格式化Namenode

(只在hadoop1上操作)

1
hdfs namenode -format

启动HDFS

(只在hadoop1上操作)

1
2
cd ${HADOOP_HOME}/sbin
./start-dfs.sh

启动Yarn

(只在hadoop1上操作)

1
2
cd ${HADOOP_HOME}/sbin
./start-yarn.sh

使用jps命令查看启动的服务

如果在hadoop1上看到

1
2
3
4
5
22177 NameNode
21139 SecondaryNameNode
11213 QuorumPeerMain
23134 Jps
22734 ResourceManager

在hadoop2和hadoop3上看到

1
2
3
4
10880 QuorumPeerMain
19970 NodeManager
19142 DataNode
20268 Jps

说明没什么问题

如果启动有问题,可以在/data/logs/hadoop文件夹下面看服务对应的日志

Hadoop Web控制台

Namenode

http://192.168.199.174:9870/dfshealth.html#tab-datanode

Datanode

http://192.168.199.171:9864/datanode.html#tab-overview

Cluster

http://192.168.199.174:8088/cluster

Hbase

下载设置Hbase

下载hbase

1
2
3
4
wget https://wood-bucket.oss-cn-beijing.aliyuncs.com/Linux/hbase-2.2.5-bin.tar.gz
tar -xzvf hbase-2.2.5-bin.tar.gz
sudo mv hbase-2.2.5 /usr/local/hbase
sudo chown -R hadoop:hadoop /usr/local/hbase

设置环境变量

vim ~/.bashrc

1
2
3
4
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export PATH=$PATH:$JAVA_HOME/bin
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin

使之生效

source ~/.bashrc

添加用户组

1
2
groupadd supergroup
usermod -a -G supergroup hadoop

设置日志文件夹

1
2
mkdir /data/logs/hbase
chown -R hadoop:hadoop /data/logs/hbase

设置Hbase配置文件

配置hbase-env.sh

vim /usr/local/hbase/conf/hbase-env.sh

1
2
3
4
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export HBASE_MANAGES_ZK=false
export HBASE_LOG_DIR=/data/logs/hbase
export HBASE_CLASSPATH=/usr/local/hadoop/etc/hadoop

配置hbase-site.xml

vim /usr/local/hbase/conf/hbase-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop1:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>zookeeper1,zookeeper2,zookeeper3</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>

如果不设置hbase.unsafe.stream.capability.enforce,master会起不来。

启动hbase

在hbase1上启动Master

1
./hbase-daemon.sh start master

在hbase2和3上启动Regionserver

1
./hbase-daemon.sh start regionserver

使用jps查看服务是否正常启动

Hbase控制台

Master

http://192.168.199.174:16010/master-status

Regionserver

http://192.168.199.171:16030/rs-status

参考文档

Hbase官方文档:http://hbase.apache.org/book.html#java