[原创]K8s中搭建自动恢复的redis集群
参考 https://www.jianshu.com/p/65c4baadf5d9
但是这个方案并不完美,存在问题:
pod重启后pod ip已经改变,而redis配置nodes.conf中的ip并没有变。如果单个节点down掉后重启,集群是可以恢复的,如果存在一半以上节点down的情况下,比如k8s集群重启,redis集群是不能恢复的。
如何在集群整体重启情况下自动恢复?
redis是依赖nodes.conf配置节点信息,从而相互通信。因此我们只要保证nodes.conf能更新成新的pod ip就可以了。
步骤:
pod启动后向一个所有redis实例都能访问的地方写入一个redisid和ip的对应关系。同时对nodes.conf中所有redisid,去查找对应的ip,如果能ping通(或者其他健康检测方式)则认为ip有效,更新nodes.conf。直到所有ip都在线后,启动redis实例。
思路就是这样了,下面是具体实现并贴代码:
修改redis-cluster.yaml,增加公共访问区域:
比如我这里是nfs:挂载到/sh,sh下添加start.sh和checkip.sh,由于redis镜像中没有ping工具,偷懒起见手动copy ping、libcap.so.2、libcrypto.so.10、libidn.so.11到挂载目录下,并在start.sh中指定LD_LIBRARY_PATH。
start.sh
#!/bin/bash export LD_LIBRARY_PATH=/sh:$LD_LIBRARY_PATH newip=`cat /etc/hosts|grep redis-app|awk '{print $1}'` myid=`cat /var/lib/redis/nodes.conf |grep myself|awk '{print $1}'` if [ "$newip"x == ""x ]; then echo "Cannot find new ip"; exit 1; elif [ "$myid"x == ""x ]; then echo "no myid"; exit 1; else echo $newip > /sh/$myid echo "refresh ip: $myid -> $newip"; fi echo "check nodes.conf" cat /var/lib/redis/nodes.conf|grep -E "master|slave"|awk '{print $1}'|xargs -i /sh/checkip.sh {} if [ $? -eq 0 ]; then echo "done nodes.conf" redis-server /etc/redis/redis.conf --protected-mode no else echo "abort on error" fi
checkip.sh
#!/bin/bash if [ $# -ne 1 ]; then exit 1; fi while : do echo "while"; chkip=`cat /sh/$1` if [ "$chkip"x == ""x ]; then sleep 1s; else /sh/ping -c1 $chkip; if [ $? -eq 0 ]; then oldip=`cat /var/lib/redis/nodes.conf |grep -E "^$1"|awk '{print $2}'|cut -d ":" -f1` if [ "$oldip"x == ""x ]; then echo "no old ip"; exit 1; else echo "oldip=$oldip and newip=$chkip" sed -i "s/$oldip/$chkip/g" /var/lib/redis/nodes.conf echo "done $1 $chkip"; exit 0; fi else sleep 1s; fi fi done
最后把pod的启动脚本改为/sh/start.sh,更新pod:
kubectl apply -f redis-stateful.yaml
测试:把redis-stateful整体移除,并重新create,集群正常恢复,以下是pod日志:
refresh ip: 2d05fb2406a254f08664b5ff5d26a151b5b262cc -> 10.244.4.68 check nodes.conf while PING 10.244.4.68 (10.244.4.68) 56(84) bytes of data. 64 bytes from 10.244.4.68: icmp_seq=1 ttl=64 time=0.072 ms --- 10.244.4.68 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms oldip=10.244.4.66 and newip=10.244.4.68 done 2d05fb2406a254f08664b5ff5d26a151b5b262cc 10.244.4.68 while PING 10.244.1.89 (10.244.1.89) 56(84) bytes of data. --- 10.244.1.89 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms while PING 10.244.1.89 (10.244.1.89) 56(84) bytes of data. --- 10.244.1.89 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms while PING 10.244.1.91 (10.244.1.91) 56(84) bytes of data. 64 bytes from 10.244.1.91: icmp_seq=1 ttl=62 time=1.86 ms --- 10.244.1.91 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.861/1.861/1.861/0.000 ms oldip=10.244.1.89 and newip=10.244.1.91 done 1665fb20c43e7468c54bbdea7ed6e283283669df 10.244.1.91 while PING 10.244.2.135 (10.244.2.135) 56(84) bytes of data. 64 bytes from 10.244.2.135: icmp_seq=1 ttl=62 time=1.87 ms --- 10.244.2.135 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.870/1.870/1.870/0.000 ms oldip=10.244.2.133 and newip=10.244.2.135 done 7a17b9c2f70a053f98fb480492a8e904d330f9ac 10.244.2.135 while PING 10.244.1.90 (10.244.1.90) 56(84) bytes of data. 64 bytes from 10.244.1.90: icmp_seq=1 ttl=62 time=1.96 ms --- 10.244.1.90 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.967/1.967/1.967/0.000 ms oldip=10.244.1.88 and newip=10.244.1.90 done 12cf9455a800f940f427c745318d1300a6730103 10.244.1.90 while PING 10.244.4.69 (10.244.4.69) 56(84) bytes of data. 64 bytes from 10.244.4.69: icmp_seq=1 ttl=64 time=0.106 ms --- 10.244.4.69 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.106/0.106/0.106/0.000 ms oldip=10.244.4.67 and newip=10.244.4.69 done d1e352d917cb9dfd5be2b053ef34473e11c7ea23 10.244.4.69 while PING 10.244.2.136 (10.244.2.136) 56(84) bytes of data. 64 bytes from 10.244.2.136: icmp_seq=1 ttl=62 time=1.89 ms --- 10.244.2.136 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.899/1.899/1.899/0.000 ms oldip=10.244.2.134 and newip=10.244.2.136 done 283aa9be8d0d4b25bfb79cf0a7eb084284b4f44d 10.244.2.136 done nodes.conf 78:C 04 Jul 2019 08:54:28.879 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 78:C 04 Jul 2019 08:54:28.879 # Redis version=5.0.4, bits=64, commit=00000000, modified=0, pid=78, just started 78:C 04 Jul 2019 08:54:28.879 # Configuration loaded 78:M 04 Jul 2019 08:54:28.887 * Node configuration loaded, I'm 2d05fb2406a254f08664b5ff5d26a151b5b262cc 78:M 04 Jul 2019 08:54:28.888 * Running mode=cluster, port=6379. 78:M 04 Jul 2019 08:54:28.888 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 78:M 04 Jul 2019 08:54:28.888 # Server initialized 78:M 04 Jul 2019 08:54:28.888 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 78:M 04 Jul 2019 08:54:28.890 * Reading RDB preamble from AOF file... 78:M 04 Jul 2019 08:54:28.890 * Reading the remaining AOF tail... 78:M 04 Jul 2019 08:54:28.892 * DB loaded from append only file: 0.004 seconds 78:M 04 Jul 2019 08:54:28.892 * Ready to accept connections 78:M 04 Jul 2019 08:54:28.899 * Clear FAIL state for node 283aa9be8d0d4b25bfb79cf0a7eb084284b4f44d: replica is reachable again. 78:M 04 Jul 2019 08:54:32.367 * Clear FAIL state for node 1665fb20c43e7468c54bbdea7ed6e283283669df: replica is reachable again. 78:M 04 Jul 2019 08:54:32.462 * Clear FAIL state for node d1e352d917cb9dfd5be2b053ef34473e11c7ea23: replica is reachable again. 78:M 04 Jul 2019 08:54:33.316 * Replica 10.244.1.91:6379 asks for synchronization 78:M 04 Jul 2019 08:54:33.316 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'ce61d735f3070e58c8d1eef8d705416e1dec10d8', my replication IDs are 'b726060355dca8aa11c4bf4c47e29d4423347021' and '0000000000000000000000000000000000000000') 78:M 04 Jul 2019 08:54:33.316 * Starting BGSAVE for SYNC with target: disk 78:M 04 Jul 2019 08:54:33.318 * Background saving started by pid 82 82:C 04 Jul 2019 08:54:33.325 * DB saved on disk 82:C 04 Jul 2019 08:54:33.328 * RDB: 4 MB of memory used by copy-on-write 78:M 04 Jul 2019 08:54:33.372 * Background saving terminated with success 78:M 04 Jul 2019 08:54:33.376 * Synchronization with replica 10.244.1.91:6379 succeeded 78:M 04 Jul 2019 08:54:34.061 * Marking node 12cf9455a800f940f427c745318d1300a6730103 as failing (quorum reached). 78:M 04 Jul 2019 08:54:35.294 # Failover auth granted to d1e352d917cb9dfd5be2b053ef34473e11c7ea23 for epoch 47 78:M 04 Jul 2019 08:54:37.016 * Clear FAIL state for node 12cf9455a800f940f427c745318d1300a6730103: replica is reachable again. 78:M 04 Jul 2019 08:54:39.654 * Clear FAIL state for node 7a17b9c2f70a053f98fb480492a8e904d330f9ac: is reachable again and nobody is serving its slots after some time. 78:M 04 Jul 2019 08:54:40.364 # Cluster state changed: ok
近期评论