redis--集群(cluster)

集群的作用:

  • 分散单台服务器的访问压力,实现负载均衡
  • 分散单台服务器的存储压力,实现可扩展性
  • 降低单台服务器宕机带来的业务灾难

数据存储设计

  1. 计算key应该放在哪个槽位slot —–> CRC16(key)—–>%16384
  2. 集群就是对16384个槽位的分配,每台机器上都有记录所有槽位的分配
  3. 对多两次就可命中

相关配置

  1. 开启集群 —–> cluster-enabled yes

  2. 集群配置文件 —-> cluster-config-file nodes-6379.conf

    这个文件是由redis自己管理的,如果一个机器上跑了多个集群节点注意文件名不要一样,否则会覆盖,影响集群

  3. 超时时间 —->cluster-node-timeout 15000

启动集群

用的5.0.5版本已经不推荐使用redis-trib.rb脚本来启动集群了。

redis-cli --cluster help会显示详细的使用。

1
2
create         host1:port1 ... hostN:portN
--cluster-replicas <arg>

创建集群 —-> ./src/redis-cli --cluster create 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384 --cluster-replicas 1

端口号后边跟的参数表明一个master会有几个slave,从前往后排,前面的是master,后边的为slave,当指定的个数有误时:

1
2
3
4
*** ERROR: Invalid configuration for cluster creation.
*** Redis Cluster requires at least 3 master nodes.
*** This is not possible with 6 nodes and 3 replicas per node.
*** At least 12 nodes are required.

成功启动集群:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
[root@localhost redis-5.0.5]# ./src/redis-cli --cluster create 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384 --cluster-replicas 1
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 127.0.0.1:6383 to 127.0.0.1:6379
Adding replica 127.0.0.1:6384 to 127.0.0.1:6380
Adding replica 127.0.0.1:6382 to 127.0.0.1:6381
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: 8608aea70c528e8f632c083b79e44a4782630437 127.0.0.1:6379
slots:[0-5460] (5461 slots) master
M: 82829a4cfa624c3d658e33d66b67d7cfb374d5ea 127.0.0.1:6380
slots:[5461-10922] (5462 slots) master
M: ee723da8dfb10b775570b9069f2fb17bff291e9f 127.0.0.1:6381
slots:[10923-16383] (5461 slots) master
S: c5ea0e80c1597c349753b7191cf9bc529e6c5b56 127.0.0.1:6382
replicates ee723da8dfb10b775570b9069f2fb17bff291e9f
S: 03ff774a84676e19076bd93327ec00c85c6ee015 127.0.0.1:6383
replicates 8608aea70c528e8f632c083b79e44a4782630437
S: 3867efa8d6dd57f7688f0e78e3f7c098d5b0654f 127.0.0.1:6384
replicates 82829a4cfa624c3d658e33d66b67d7cfb374d5ea
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
......
>>> Performing Cluster Check (using node 127.0.0.1:6379)
M: 8608aea70c528e8f632c083b79e44a4782630437 127.0.0.1:6379
slots:[0-5460] (5461 slots) master
1 additional replica(s)
M: ee723da8dfb10b775570b9069f2fb17bff291e9f 127.0.0.1:6381
slots:[10923-16383] (5461 slots) master
1 additional replica(s)
S: c5ea0e80c1597c349753b7191cf9bc529e6c5b56 127.0.0.1:6382
slots: (0 slots) slave
replicates ee723da8dfb10b775570b9069f2fb17bff291e9f
M: 82829a4cfa624c3d658e33d66b67d7cfb374d5ea 127.0.0.1:6380
slots:[5461-10922] (5462 slots) master
1 additional replica(s)
S: 03ff774a84676e19076bd93327ec00c85c6ee015 127.0.0.1:6383
slots: (0 slots) slave
replicates 8608aea70c528e8f632c083b79e44a4782630437
S: 3867efa8d6dd57f7688f0e78e3f7c098d5b0654f 127.0.0.1:6384
slots: (0 slots) slave
replicates 82829a4cfa624c3d658e33d66b67d7cfb374d5ea
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

若要使用集群,再连接的时候需要加上-c参数

下线

  1. 如果slave掉线,和其相连的master过了超时时间后会将其标记—>

    Marking node c5ea0e80c1597c349753b7191cf9bc529e6c5b56 as failing (quorum reached).

    且会通知其他的所有节点,其他节点会收到一个消息:

    FAIL message received from ee723da8dfb10b775570b9069f2fb17bff291e9f about c5ea0e80c1597c349753b7191cf9bc529e6c5b56

  2. 在实验的时候发现一个也会这样,停掉一台slave,有两台master进行标记,该slave重连后,其中一台master(该从机的)是正常的和它进行同步,另外一台master和其他节点一样清除该标记

    Clear FAIL state for node c5ea0e80c1597c349753b7191cf9bc529e6c5b56: replica is reachable again.

  3. 所以说并不一定是slavemaster进行标记通知,也可能是其他的master发现该slave掉线进行标记通知

  4. 如果是master掉线,其slave会变为master

    1995:S 08 Mar 2020 01:22:34.810 # Start of election delayed for 791 milliseconds (rank #0, offset 1722). 1995:S 08 Mar 2020 01:22:34.810 # Cluster state changed: fail 1995:S 08 Mar 2020 01:22:35.214 * Connecting to MASTER 127.0.0.1:6379 1995:S 08 Mar 2020 01:22:35.215 * MASTER <-> REPLICA sync started 1995:S 08 Mar 2020 01:22:35.215 # Error condition on socket for SYNC: Connection refused 1995:S 08 Mar 2020 01:22:35.620 # Starting a failover election for epoch 7. 1995:S 08 Mar 2020 01:22:37.208 # Failover election won: I'm the new master.

  5. master重连后,会变为新masterslave