etcd 替换扩容节点

etcd 替换节点的本质就是添加一个新的实例, 再删除一个已有实例, 以完成替换. 如果替换的是一台已经无法正常运行的主机, 你需要先删除掉故障节点, 然后再正常添加一个节点.

实验环境

1
2
3
4
5
6
7
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://192.168.149.63:2379 | 1161d5b4260241e3 | 3.2.28 | 20 MB | false | 8 | 141652 |
| http://192.168.149.62:2379 | 4252aec339d438d9 | 3.2.28 | 20 MB | true | 8 | 141652 |
| http://192.168.149.61:2379 | e6f45ed7d9402b75 | 3.2.28 | 20 MB | false | 8 | 141652 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

替换需求节点: 192.168.149.61 替换成 192.168.149.64

替换总体步骤(先添加, 后删除)

  • 执行etcd运行时配置命令, 添加节点, 注意记录回显
  • 在新机器上使用上一步首先的参数, 启动etcd服务
  • 删除需要替换的节点

Step 1: 执行运行时配置, 添加节点

1
2
3
4
5
6
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.61:2379 member add lv-etcd-research-alpha-4 --peer-urls="http://192.168.149.64:2380"
Member ea04db3353b9fd4e added to cluster 2c25150e88501a13

ETCD_NAME="lv-etcd-research-alpha-4"
ETCD_INITIAL_CLUSTER="lv-etcd-research-alpha-1=http://192.168.149.63:2380,lv-etcd-research-alpha-3=http://192.168.149.62:2380,lv-etcd-research-alpha-2=http://192.168.149.61:2380,lv-etcd-research-alpha-4=http://192.168.149.64:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

查看当前集群成员

1
2
3
4
5
6
7
8
9
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.63:2379,http://192.168.149.62:2379,http://192.168.149.61:2379 member list -w table
+------------------+-----------+--------------------------+----------------------------+----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+-----------+--------------------------+----------------------------+----------------------------+
| 1161d5b4260241e3 | started | lv-etcd-research-alpha-1 | http://192.168.149.63:2380 | http://192.168.149.63:2379 |
| 4252aec339d438d9 | started | lv-etcd-research-alpha-3 | http://192.168.149.62:2380 | http://192.168.149.62:2379 |
| e6f45ed7d9402b75 | started | lv-etcd-research-alpha-2 | http://192.168.149.61:2380 | http://192.168.149.61:2379 |
| ea04db3353b9fd4e | unstarted | | http://192.168.149.64:2380 | |
+------------------+-----------+--------------------------+----------------------------+----------------------------+

Step 2: 根据回显参数, 启动服务

在任意已运行节点执行

1
2
# 可以先将一个已存在节点上的配置文件, 发送到新的节点
> scp /etc/etcd/etcd.conf root@192.168.149.64:/etc/etcd/

在新节点执行

1
2
3
4
5
6
# 编译配置文件, 将上一步运行时配置的回显结果中的参数, 替换到配置文件中
> vim /etc/etcd/etcd.conf

ETCD_NAME="lv-etcd-research-alpha-4"
ETCD_INITIAL_CLUSTER="lv-etcd-research-alpha-1=http://192.168.149.63:2380,lv-etcd-research-alpha-3=http://192.168.149.62:2380,lv-etcd-research-alpha-2=http://192.168.149.61:2380,lv-etcd-research-alpha-4=http://192.168.149.64:2380"
ETCD_INITIAL_CLUSTER_STATE="existing"

同理, 由于是copy过来的配置文件, 以下参数也需要做响应的修改:

1
2
3
4
ETCD_LISTEN_PEER_URLS
ETCD_LISTEN_CLIENT_URLS
ETCD_INITIAL_ADVERTISE_PEER_URLS
ETCD_ADVERTISE_CLIENT_URLS

由copy原主机IP, 修改到目标主机IP地址, 修改完成后, 启动etcd 服务

1
systemctl start etcd

集群状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.63:2379,http://192.168.149.62:2379,http://192.168.149.61:2379,http://192.168.149.64:2379 endpoint status -w table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://192.168.149.63:2379 | 1161d5b4260241e3 | 3.2.28 | 21 MB | false | 8 | 144745 |
| http://192.168.149.62:2379 | 4252aec339d438d9 | 3.2.28 | 21 MB | true | 8 | 144745 |
| http://192.168.149.61:2379 | e6f45ed7d9402b75 | 3.2.28 | 21 MB | false | 8 | 144745 |
| http://192.168.149.64:2379 | ea04db3353b9fd4e | 3.2.28 | 21 MB | false | 8 | 144745 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.63:2379,http://192.168.149.62:2379,http://192.168.149.61:2379 member list -w table
+------------------+---------+--------------------------+----------------------------+----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+--------------------------+----------------------------+----------------------------+
| 1161d5b4260241e3 | started | lv-etcd-research-alpha-1 | http://192.168.149.63:2380 | http://192.168.149.63:2379 |
| 4252aec339d438d9 | started | lv-etcd-research-alpha-3 | http://192.168.149.62:2380 | http://192.168.149.62:2379 |
| e6f45ed7d9402b75 | started | lv-etcd-research-alpha-2 | http://192.168.149.61:2380 | http://192.168.149.61:2379 |
| ea04db3353b9fd4e | started | lv-etcd-research-alpha-4 | http://192.168.149.64:2380 | http://192.168.149.64:2379 |
+------------------+---------+--------------------------+----------------------------+----------------------------+

注意: 执行endpoint status查询时, 记得在--endpoints参数中, 加上新节点的地址http://192.168.149.64:2379

Step 3: 删除需要替换的节点

1
2
3
4
5
6
7
8
9
10
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.63:2379 member remove e6f45ed7d9402b75
Member e6f45ed7d9402b75 removed from cluster 2c25150e88501a13
> ETCDCTL_API=3 etcdctl --endpoints http://192.168.149.63:2379 member list -w table
+------------------+---------+--------------------------+----------------------------+----------------------------+
| ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS |
+------------------+---------+--------------------------+----------------------------+----------------------------+
| 1161d5b4260241e3 | started | lv-etcd-research-alpha-1 | http://192.168.149.63:2380 | http://192.168.149.63:2379 |
| 4252aec339d438d9 | started | lv-etcd-research-alpha-3 | http://192.168.149.62:2380 | http://192.168.149.62:2379 |
| ea04db3353b9fd4e | started | lv-etcd-research-alpha-4 | http://192.168.149.64:2380 | http://192.168.149.64:2379 |
+------------------+---------+--------------------------+----------------------------+----------------------------+

节点已被删除, 节点被删除后, etcd服务会被关闭, 日志中将出现如下信息:

1
2
3
removed member e6f45ed7d9402b75 from cluster 2c25150e88501a13
...
the member has been permanently removed from the cluster

此时, 你需要保证该节点的etcd不会自动启动, 开机启动, 重新启动. (虽然即使启动也不会再次成功加入到集群, 但是为了避免不必要的错误, 还是需要保证挂的彻底一些, 最好将数据目录也一并删除)

替换故障节点

以上的操作, 前提是集群节点全部正常的情况下, 才能执行member的操作, 当其中一个节点故障时, 你将无法直接新增节点, 你需要先删除故障节点, 然后再执行新增节点的操作