[原创]K8s中搭建自动恢复的redis集群

作者: nick 日期: 2019-06-30 发表评论 (0) 查看评论

参考 https://www.jianshu.com/p/65c4baadf5d9

但是这个方案并不完美，存在问题：

pod重启后pod ip已经改变，而redis配置nodes.conf中的ip并没有变。如果单个节点down掉后重启，集群是可以恢复的，如果存在一半以上节点down的情况下，比如k8s集群重启，redis集群是不能恢复的。

如何在集群整体重启情况下自动恢复？

redis是依赖nodes.conf配置节点信息，从而相互通信。因此我们只要保证nodes.conf能更新成新的pod ip就可以了。

步骤：

pod启动后向一个所有redis实例都能访问的地方写入一个redisid和ip的对应关系。同时对nodes.conf中所有redisid，去查找对应的ip，如果能ping通（或者其他健康检测方式）则认为ip有效，更新nodes.conf。直到所有ip都在线后，启动redis实例。

思路就是这样了，下面是具体实现并贴代码：

修改redis-cluster.yaml，增加公共访问区域：

比如我这里是nfs：挂载到/sh，sh下添加start.sh和checkip.sh，由于redis镜像中没有ping工具，偷懒起见手动copy ping、libcap.so.2、libcrypto.so.10、libidn.so.11到挂载目录下，并在start.sh中指定LD_LIBRARY_PATH。

start.sh

#!/bin/bash

export LD_LIBRARY_PATH=/sh:$LD_LIBRARY_PATH

newip=`cat /etc/hosts|grep redis-app|awk '{print $1}'`
myid=`cat /var/lib/redis/nodes.conf |grep myself|awk '{print $1}'`
if [ "$newip"x == ""x ]; then
    echo "Cannot find new ip";
    exit 1;
elif [ "$myid"x == ""x ]; then
    echo "no myid";
    exit 1;
else
    echo $newip > /sh/$myid 
    echo "refresh ip: $myid -> $newip";
fi

echo "check nodes.conf"
cat /var/lib/redis/nodes.conf|grep -E "master|slave"|awk '{print $1}'|xargs -i /sh/checkip.sh {}

if [ $? -eq 0 ]; then
    echo "done nodes.conf"
    redis-server /etc/redis/redis.conf --protected-mode no
else
    echo "abort on error"
fi

checkip.sh

#!/bin/bash

if [ $# -ne 1 ]; then
    exit 1;
fi

while :
do
    echo "while";
    chkip=`cat /sh/$1`
    if [ "$chkip"x == ""x ]; then
        sleep 1s;
    else
        /sh/ping -c1 $chkip;
        if [ $? -eq 0 ]; then
            oldip=`cat /var/lib/redis/nodes.conf |grep -E "^$1"|awk '{print $2}'|cut -d ":" -f1`
            if [ "$oldip"x == ""x ]; then
                echo "no old ip";
                exit 1;
            else
                echo "oldip=$oldip and newip=$chkip"
                sed -i "s/$oldip/$chkip/g" /var/lib/redis/nodes.conf
                echo "done $1 $chkip";
                exit 0;
            fi
        else
            sleep 1s;
        fi
    fi
done

最后把pod的启动脚本改为/sh/start.sh，更新pod：

kubectl apply -f redis-stateful.yaml

测试：把redis-stateful整体移除，并重新create，集群正常恢复，以下是pod日志：

refresh ip: 2d05fb2406a254f08664b5ff5d26a151b5b262cc -> 10.244.4.68
check nodes.conf
while
PING 10.244.4.68 (10.244.4.68) 56(84) bytes of data.
64 bytes from 10.244.4.68: icmp_seq=1 ttl=64 time=0.072 ms

--- 10.244.4.68 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms
oldip=10.244.4.66 and newip=10.244.4.68
done 2d05fb2406a254f08664b5ff5d26a151b5b262cc 10.244.4.68
while
PING 10.244.1.89 (10.244.1.89) 56(84) bytes of data.

--- 10.244.1.89 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

while
PING 10.244.1.89 (10.244.1.89) 56(84) bytes of data.

--- 10.244.1.89 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

while
PING 10.244.1.91 (10.244.1.91) 56(84) bytes of data.
64 bytes from 10.244.1.91: icmp_seq=1 ttl=62 time=1.86 ms

--- 10.244.1.91 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.861/1.861/1.861/0.000 ms
oldip=10.244.1.89 and newip=10.244.1.91
done 1665fb20c43e7468c54bbdea7ed6e283283669df 10.244.1.91
while
PING 10.244.2.135 (10.244.2.135) 56(84) bytes of data.
64 bytes from 10.244.2.135: icmp_seq=1 ttl=62 time=1.87 ms

--- 10.244.2.135 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.870/1.870/1.870/0.000 ms
oldip=10.244.2.133 and newip=10.244.2.135
done 7a17b9c2f70a053f98fb480492a8e904d330f9ac 10.244.2.135
while
PING 10.244.1.90 (10.244.1.90) 56(84) bytes of data.
64 bytes from 10.244.1.90: icmp_seq=1 ttl=62 time=1.96 ms

--- 10.244.1.90 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.967/1.967/1.967/0.000 ms
oldip=10.244.1.88 and newip=10.244.1.90
done 12cf9455a800f940f427c745318d1300a6730103 10.244.1.90
while
PING 10.244.4.69 (10.244.4.69) 56(84) bytes of data.
64 bytes from 10.244.4.69: icmp_seq=1 ttl=64 time=0.106 ms

--- 10.244.4.69 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.106/0.106/0.106/0.000 ms
oldip=10.244.4.67 and newip=10.244.4.69
done d1e352d917cb9dfd5be2b053ef34473e11c7ea23 10.244.4.69
while
PING 10.244.2.136 (10.244.2.136) 56(84) bytes of data.
64 bytes from 10.244.2.136: icmp_seq=1 ttl=62 time=1.89 ms

--- 10.244.2.136 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.899/1.899/1.899/0.000 ms
oldip=10.244.2.134 and newip=10.244.2.136
done 283aa9be8d0d4b25bfb79cf0a7eb084284b4f44d 10.244.2.136
done nodes.conf
78:C 04 Jul 2019 08:54:28.879 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
78:C 04 Jul 2019 08:54:28.879 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=78, just started
78:C 04 Jul 2019 08:54:28.879 # Configuration loaded
78:M 04 Jul 2019 08:54:28.887 * Node configuration loaded, I'm 2d05fb2406a254f08664b5ff5d26a151b5b262cc
78:M 04 Jul 2019 08:54:28.888 * Running mode=cluster, port=6379.
78:M 04 Jul 2019 08:54:28.888 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
78:M 04 Jul 2019 08:54:28.888 # Server initialized
78:M 04 Jul 2019 08:54:28.888 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
78:M 04 Jul 2019 08:54:28.890 * Reading RDB preamble from AOF file...
78:M 04 Jul 2019 08:54:28.890 * Reading the remaining AOF tail...
78:M 04 Jul 2019 08:54:28.892 * DB loaded from append only file: 0.004 seconds
78:M 04 Jul 2019 08:54:28.892 * Ready to accept connections
78:M 04 Jul 2019 08:54:28.899 * Clear FAIL state for node 283aa9be8d0d4b25bfb79cf0a7eb084284b4f44d: replica is reachable again.
78:M 04 Jul 2019 08:54:32.367 * Clear FAIL state for node 1665fb20c43e7468c54bbdea7ed6e283283669df: replica is reachable again.
78:M 04 Jul 2019 08:54:32.462 * Clear FAIL state for node d1e352d917cb9dfd5be2b053ef34473e11c7ea23: replica is reachable again.
78:M 04 Jul 2019 08:54:33.316 * Replica 10.244.1.91:6379 asks for synchronization
78:M 04 Jul 2019 08:54:33.316 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'ce61d735f3070e58c8d1eef8d705416e1dec10d8', my replication IDs are 'b726060355dca8aa11c4bf4c47e29d4423347021' and '0000000000000000000000000000000000000000')
78:M 04 Jul 2019 08:54:33.316 * Starting BGSAVE for SYNC with target: disk
78:M 04 Jul 2019 08:54:33.318 * Background saving started by pid 82
82:C 04 Jul 2019 08:54:33.325 * DB saved on disk
82:C 04 Jul 2019 08:54:33.328 * RDB: 4 MB of memory used by copy-on-write
78:M 04 Jul 2019 08:54:33.372 * Background saving terminated with success
78:M 04 Jul 2019 08:54:33.376 * Synchronization with replica 10.244.1.91:6379 succeeded
78:M 04 Jul 2019 08:54:34.061 * Marking node 12cf9455a800f940f427c745318d1300a6730103 as failing (quorum reached).
78:M 04 Jul 2019 08:54:35.294 # Failover auth granted to d1e352d917cb9dfd5be2b053ef34473e11c7ea23 for epoch 47
78:M 04 Jul 2019 08:54:37.016 * Clear FAIL state for node 12cf9455a800f940f427c745318d1300a6730103: replica is reachable again.
78:M 04 Jul 2019 08:54:39.654 * Clear FAIL state for node 7a17b9c2f70a053f98fb480492a8e904d330f9ac: is reachable again and nobody is serving its slots after some time.
78:M 04 Jul 2019 08:54:40.364 # Cluster state changed: ok

Moo

← 1/30 图片发布计划

2/30 图片发布计划 →

发表评论？