MySQL高可用解决方案MMM(mysql多主复制管理器)
一、MMM简介: MMM即Multi-Master Replication Manager for MySQL:mysql多主复制管理器,基于perl实现,关于mysql主主复制配置的监控、故障转移和管理的一套可伸缩的脚本套件(在任何时候只有一个节点
一、MMM简介:
MMM即Multi-Master Replication Manager for MySQL:mysql多主复制管理器,基于perl实现,关于mysql主主复制配置的监控、故障转移和管理的一套可伸缩的脚本套件(在任何时候只有一个节点可以被写入),MMM也能对从服务器进行读负载均衡,所以可以用它来在一组用于复制的服务器启动虚拟ip,除此之外,它还有实现数据备份、节点之间重新同步功能的脚本。MySQL本身没有提供replication failover的解决方案,通过MMM方案能实现服务器的故障转移,从而实现mysql的高可用。MMM不仅能提供浮动IP的功能,如果当前的主服务器挂掉后,会将你后端的从服务器自动转向新的主服务器进行同步复制,不用手工更改同步配置。这个方案是目前比较成熟的解决方案。详情请看官网:http://mysql-mmm.org
优点:高可用性,扩展性好,出现故障自动切换,对于主主同步,在同一时间只提供一台数据库写操作,保证的数据的一致性。当主服务器挂掉以后,另一个主立即接管,其他的从服务器能自动切换,不用人工干预。
缺点:monitor节点是单点,不过这个你也可以结合keepalived或者haertbeat做成高可用;至少三个节点,对主机的数量有要求,需要实现读写分离,还需要在前端编写读写分离程序。在读写非常繁忙的业务系统下表现不是很稳定,可能会出现复制延时、切换失效等问题。MMM方案并不太适应于对数据安全性要求很高,并且读、写繁忙的环境中。
适用场景:
MMM的适用场景为数据库访问量大,并且能实现读写分离的场景。 Mmm主要功能由下面三个脚本提供: mmm_mond 负责所有的监控工作的监控守护进程,决定节点的移除(mmm_mond进程定时心跳检测,失败则将write ip浮动到另外一台master)等等 mmm_agentd 运行在mysql服务器上的代理守护进程,通过简单远程服务集提供给监控节点 mmm_control 通过命令行管理mmm_mond进程 在整个监管过程中,需要在mysql中添加相关授权用户,授权的用户包括一个mmm_monitor用户和一个mmm_agent用户,如果想使用mmm的备份工具则还要添加一个mmm_tools用户。
二、部署实施
1、环境介绍
OS:centos7.2(64位)数据库系统:mysql5.7.13
关闭selinux
配置ntp,同步时间
角色
IP
hostname
Server-id
Write vip
Read vip
Master1
192.168.31.83
master1
1
192.168.31.2
Master2(backup)
192.168.31.141
master2
2
192.168.31.3
Slave1
192.168.31.250
slave1
3
192.168.31.4
Slave2
192.168.31.225
slave2
4
192.168.31.5
monitor
192.168.31.106
monitor1
无
2、在所有主机上配置/etc/hosts文件,添加如下内容:
192.168.31.83 master1 192.168.31.141 master2 192.168.31.250 slave1 192.168.31.225 slave2 192.168.31.106 monitor1
在所有主机上安装perl、perl-develperl-CPAN libart_lgpl.x86_64 rrdtool.x86_64 rrdtool-perl.x86_64包 #yum -y install perl-* libart_lgpl.x86_64 rrdtool.x86_64 rrdtool-perl.x86_64
注:使用centos7在线yum源安装
安装perl的相关库
#cpan -i Algorithm::Diff Class::Singleton DBI DBD::mysql Log::Dispatch Log::Log4perl Mail::Send Net::Ping Proc::Daemon Time::HiRes Params::Validate Net::ARP
3、在master1、master2、slave1、slave2主机上安装mysql5.7和配置复制
master1和master2互为主从,slave1、slave2为master1的从 在每个mysql的配置文件/etc/my.cnf中加入以下内容, 注意server_id不能重复。
master1主机:
log-bin = mysql-bin binlog_format = mixed server-id = 1 relay-log = relay-bin relay-log-index = slave-relay-bin.index log-slave-updates = 1 auto-increment-increment = 2 auto-increment-offset = 1 master2主机: log-bin = mysql-bin binlog_format = mixed server-id = 2 relay-log = relay-bin relay-log-index = slave-relay-bin.index log-slave-updates = 1 auto-increment-increment = 2 auto-increment-offset = 2 slave1主机: server-id = 3 relay-log = relay-bin relay-log-index = slave-relay-bin.index read_only = 1 slave2主机: server-id = 4 relay-log = relay-bin relay-log-index = slave-relay-bin.index read_only = 1
在完成了对my.cnf的修改后,通过systemctl restart mysqld重新启动mysql服务
4台数据库主机若要开启防火墙,要么关闭防火墙或者创建访问规则:
firewall-cmd --permanent --add-port=3306/tcp firewall-cmd --reload 主从配置(master1和master2配置成主主,slave1和slave2配置成master1的从): 在master1上授权: mysql> grant replication slave on *.* to rep@'192.168.31.%' identified by '123456'; 在master2上授权: mysql> grant replication slave on *.* to rep@'192.168.31.%' identified by '123456'; 把master2、slave1和slave2配置成master1的从库: 在master1上执行show master status; 获取binlog文件和Position点 mysql> show master status; +------------------+----------+--------------+------------------+--------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+---------------------------------------------------+ | mysql-bin.000001 | 452 | | | | +------------------+----------+--------------+------------------+-----------------------------------------------------+ 在master2、slave1和slave2执行
mysql> change master to master_host='192.168.31.83',master_port=3306,master_user='rep',master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=452; mysql>slave start; 验证主从复制: master2主机: mysql> show slave statusG; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.31.83 Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 452 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 320 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes slave1主机: mysql> show slave statusG; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.31.83 Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 452 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 320 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes slave2主机: mysql> show slave statusG; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.31.83 Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 452 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 320 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes 如果Slave_IO_Running和Slave_SQL_Running都为yes,那么主从就已经配置OK了 把master1配置成master2的从库: 在master2上执行show master status ;获取binlog文件和Position点 mysql> show master status; +------------------+----------+--------------+------------------+--------------------------------------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set | +------------------+----------+--------------+------------------+---------------------------------------------------+ | mysql-bin.000001 | 452 | | | | +------------------+----------+--------------+------------------+----------------------------------------------------+ 在master1上执行: mysql> change master to master_host='192.168.31.141',master_port=3306,master_user='rep',master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=452; mysql> start slave; 验证主从复制: master1主机: mysql> show slave statusG; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.31.141 Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000001 Read_Master_Log_Pos: 452 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 320 Relay_Master_Log_File: mysql-bin.000001 Slave_IO_Running: Yes Slave_SQL_Running: Yes 如果Slave_IO_Running和Slave_SQL_Running都为yes,那么主从就已经配置OK了 4、mysql-mmm配置: 在4台mysql节点上创建用户 创建代理账号: mysql> grant super,replicationclient,process on *.* to 'mmm_agent'@'192.168.31.%' identified by '123456'; 创建监控账号: mysql> grant replication client on *.* to 'mmm_monitor'@'192.168.31.%' identified by '123456'; 注1:因为之前的主从复制,以及主从已经是ok的,所以我在master1服务器执行就ok了。 检查master2和slave1、slave2三台db上是否都存在监控和代理账号 mysql> select user,host from mysql.user where user in ('mmm_monitor','mmm_agent'); +-------------+----------------------------+ | user | host | +-------------+----------------------------+ | mmm_agent | 192.168.31.% | | mmm_monitor | 192.168.31.% | +-------------+------------------------------+ 或 mysql> show grants for 'mmm_agent'@'192.168.31.%'; +-----------------------------------------------------------------------------------------------------------------------------+ | Grants for mmm_agent@192.168.31.% | +-----------------------------------------------------------------------------------------------------------------------------+ | GRANT PROCESS, SUPER, REPLICATION CLIENT ON *.* TO 'mmm_agent'@'192.168.31.%' | +-----------------------------------------------------------------------------------------------------------------------------+ mysql> show grants for 'mmm_monitor'@'192.168.31.%'; +-----------------------------------------------------------------------------------------------------------------------------+ | Grants for mmm_monitor@192.168.31.% | +-----------------------------------------------------------------------------------------------------------------------------+ | GRANT REPLICATION CLIENT ON *.* TO 'mmm_monitor'@'192.168.31.%' | 注2: mmm_monitor用户:mmm监控用于对mysql服务器进程健康检查 mmm_agent用户:mmm代理用来更改只读模式,复制的主服务器等 5、mysql-mmm安装 在monitor主机(192.168.31.106) 上安装监控程序 cd /tmp wget http://pkgs.fedoraproject.org/repo/pkgs/mysql-mmm/mysql-mmm-2.2.1.tar.gz/f5f8b48bdf89251d3183328f0249461e/mysql-mmm-2.2.1.tar.gz tar -zxf mysql-mmm-2.2.1.tar.gz cd mysql-mmm-2.2.1 make install 在数据库服务器(master1、master2、slave1、slave2)上安装代理 cd /tmp wget http://pkgs.fedoraproject.org/repo/pkgs/mysql-mmm/mysql-mmm-2.2.1.tar.gz/f5f8b48bdf89251d3183328f0249461e/mysql-mmm-2.2.1.tar.gz tar -zxf mysql-mmm-2.2.1.tar.gz cd mysql-mmm-2.2.1 make install
6、配置mmm
编写配置文件,五台主机必须一致:
完成安装后,所有的配置文件都放到了/etc/mysql-mmm/下面。管理服务器和数据库服务器上都要包含一个共同的文件mmm_common.conf,内容如下:
active_master_rolewriter#积极的master角色的标示,所有的db服务器要开启read_only参数,对于writer服务器监控代理会自动将read_only属性关闭。
解决方法:
# cpanProc::Daemon
1. cpan Log::Log4perl
1. /etc/init.d/mysql-mmm-agent start
Daemon bin: '/usr/sbin/mmm_agentd'
Daemon pid: '/var/run/mmm_agentd.pid'
Starting MMM Agent daemon... Ok
1. netstat -antp | grep mmm_agentd
tcp 0 0 192.168.31.83:9989 0.0.0.0:* LISTEN 9693/mmm_agentd
配置防火墙
firewall-cmd --permanent --add-port=9989/tcp
firewall-cmd --reload
编辑 monitor主机上的/etc/mysql-mmm/mmm_mon.conf
includemmm_common.conf
启动报错:
Starting MMM Monitor daemon: Can not locate Proc/Daemon.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at /usr/sbin/mmm_mond line 11. BEGIN failed--compilation aborted at /usr/sbin/mmm_mond line 11. failed 解决方法:安装下列perl的库 #cpanProc::Daemon #cpan Log::Log4perl [root@monitor1 ~]# /etc/init.d/mysql-mmm-monitor start Daemon bin: '/usr/sbin/mmm_mond' Daemon pid: '/var/run/mmm_mond.pid' Starting MMM Monitor daemon: Ok [root@monitor1 ~]# netstat -anpt | grep 9988 tcp 0 0 127.0.0.1:9988 0.0.0.0:* LISTEN 8546/mmm_mond 注1:无论是在db端还是在监控端如果有对配置文件进行修改操作都需要重启代理进程和监控进程。 注2:MMM启动顺序:先启动monitor,再启动 agent
检查集群状态:
[root@monitor1 ~]# mmm_control show master1(192.168.31.83) master/ONLINE. Roles: writer(192.168.31.2) master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5) slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4) slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3)
如果服务器状态不是ONLINE,可以用如下命令将服务器上线,例如:
#mmm_controlset_online主机名
例如:[root@monitor1 ~]#mmm_controlset_onlinemaster1
从上面的显示可以看到,写请求的VIP在master1上,所有从节点也都把master1当做主节点。
查看是否启用vip
[root@master1 ~]# ipaddr show dev eno16777736
eno16777736:
MMM高可用性测试:
服务器读写采有VIP地址进行读写,出现故障时VIP会漂移到其它节点,由其它节点提供服务。 首先查看整个集群的状态,可以看到整个集群状态正常 [root@monitor1 ~]# mmm_control show master1(192.168.31.83) master/ONLINE. Roles: writer(192.168.31.2) master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5) slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4) slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3) 模拟master1宕机,手动停止mysql服务,观察monitor日志,master1的日志如下: [root@monitor1 ~]# tail -f /var/log/mysql-mmm/mmm_mond.log 2017/01/09 22:02:55 WARN Check 'rep_threads' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.31.83' (111) 2017/01/09 22:02:55 WARN Check 'rep_backlog' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.31.83' (111) 2017/01/09 22:03:05 ERROR Check 'mysql' on 'master1' has failed for 10 seconds! Message: ERROR: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.31.83' (111) 2017/01/09 22:03:07 FATAL State of host 'master1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK) 2017/01/09 22:03:07 INFO Removing all roles from host 'master1': 2017/01/09 22:03:07 INFO Removed role 'writer(192.168.31.2)' from host 'master1' 2017/01/09 22:03:07 INFO Orphaned role 'writer(192.168.31.2)' has been assigned to 'master2' 查看群集的最新状态 [root@monitor1 ~]# mmm_control show master1(192.168.31.83) master/HARD_OFFLINE. Roles: master2(192.168.31.141) master/ONLINE. Roles: reader(192.168.31.5), writer(192.168.31.2) slave1(192.168.31.250) slave/ONLINE. Roles: reader(192.168.31.4) slave2(192.168.31.225) slave/ONLINE. Roles: reader(192.168.31.3) 从显示结果可以看出master1的状态有ONLINE转换为HARD_OFFLINE,写VIP转移到了master2主机上。 检查所有的db服务器群集状态 [root@monitor1 ~]# mmm_control checks all master1 ping [last change: 2017/01/09 21:31:47] OK master1 mysql [last change: 2017/01/09 22:03:07] ERROR: Connect error (host = 192.168.31.83:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.31.83' (111) master1 rep_threads [last change: 2017/01/09 21:31:47] OK master1 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null slave1 ping [last change: 2017/01/09 21:31:47] OK slave1mysql [last change: 2017/01/09 21:31:47] OK slave1 rep_threads [last change: 2017/01/09 21:31:47] OK slave1 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null master2 ping [last change: 2017/01/09 21:31:47] OK master2 mysql [last change: 2017/01/09 21:57:32] OK master2 rep_threads [last change: 2017/01/09 21:31:47] OK master2 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null slave2 ping [last change: 2017/01/09 21:31:47] OK slave2mysql [last change: 2017/01/09 21:31:47] OK slave2 rep_threads [last change: 2017/01/09 21:31:47] OK slave2 rep_backlog [last change: 2017/01/09 21:31:47] OK: Backlog is null 从上面可以看到master1能ping通,说明只是服务死掉了。
查看master2主机的ip地址:
[root@master2 ~]# ipaddr show dev eno16777736
eno16777736:
附:
1、日志文件:
日志文件往往是分析错误的关键,所以要善于利用日志文件进行问题分析。
db端:/var/log/mysql-mmm/mmm_agentd.log
监控端:/var/log/mysql-mmm/mmm_mond.log
2、命令文件:
mmm_agentd:db代理进程的启动文件
mmm_mond:监控进程的启动文件
mmm_backup:备份文件
mmm_restore:还原文件
mmm_control:监控操作命令文件
db服务器端只有mmm_agentd程序,其它的都是在monitor服务器端。
3、mmm_control用法
mmm_control程序可以用于监控群集状态、切换writer、设置onlineoffline操作等。
Valid commands are:
help - show this message #帮助信息
ping - ping monitor #ping当前的群集是否正常
show - show status #群集在线状态检查
checks [
4、其它处理问题
如果不想让writer从master切换到backup(包括主从的延时也会导致写VIP的切换),那么可以在配置/etc/mysql-mmm/mmm_common.conf时,去掉
5、总结
1.对外提供读写的虚拟IP是由monitor程序控制。如果monitor没有启动那么db服务器不会被分配虚拟ip,但是如果已经分配好了虚拟ip,当monitor程序关闭了原先分配的虚拟ip不会立即关闭外部程序还可以连接访问(只要不重启网络),这样的好处就是对于monitor的可靠性要求就会低一些,但是如果这个时候其中的某一个db服务器故障了就无法处理切换,也就是原先的虚拟ip还是维持不变,挂掉的那台DB的虚拟ip会变的不可访问。
2.agent程序受monitor程序的控制处理write切换,从库切换等操作。如果monitor进程关闭了那么agent进程就起不到什么作用,它本身不能处理故障。
3.monitor程序负责监控db服务器的状态,包括Mysql数据库、服务器是否运行、复制线程是否正常、主从延时等;它还用于控制agent程序处理故障。
4.monitor会每隔几秒钟监控db服务器的状态,如果db服务器已经从故障变成了正常,那么monitor会自动在60s之后将其设置为online状态(默认是60s可以设为其它的值),有监控端的配置文件参数“auto_set_online”决定,群集服务器的状态有三种分别是:HARD_OFFLINE→AWAITING_RECOVERY→online 5.默认monitor会控制mmm_agent会将writer db服务器read_only修改为OFF,其它的db服务器read_only修改为ON,所以为了严谨可以在所有的服务器的my.cnf文件中加入read_only=1由monitor控制来控制writer和read,root用户和复制用户不受read_only参数的影响。