技术分享 | OceanBase 手滑误删了数据文件怎么办

作者:张乾

外星人2号,现兼任六位喵星人的资深铲屎官。

本文来源:原创投稿

* 爱可生开源社区出品,原创内容未经授权不得随意使用,转载请联系小编并注明来源。

手滑误删了数据文件,并且没有可替换的节点时,先别急着提桶跑路,可以考虑利用参数 server_permanent_offline_time 来重建受影响的节点。

原理:

server_permanent_offline_time 是 OceanBase 数据库中用于控制节点永久下线时长的参数。当集群中的某个节点宕机后,系统会根据该参数的设置值来进行相应操作。

如果节点宕机时间小于该参数设置的值,系统会暂时不做处理,以避免频繁的数据迁移;如果宕机时间超过该参数设置的值,该节点被标记为永久下线,RootService 会将该 OBServer 上包含的数据副本从 Paxos 成员组中删除,并在同 zone 内其他可用  OBServer 上补充数据,以保证数据副本 Paxos 成员组完整。该参数默认值是 3600 秒,一般设置较大,以避免不必要的副本复制。此外,当永久下线的节点重新被拉起后,其上的全部数据都需要从其他副本重新拉取。

在本场景下,即是通过调低该参数,让故障节点快速永久下线再重新上线,达到数据重建的目的。

请注意,此过程会占用集群一定的资源,可能会影响性能,因此建议在业务低峰期进行。

官方建议

关于 server_permanent_offline_time 的适用场景和建议值,官方提供如下:

1. OceanBase 数据库版本升级场景:建议将该配置项的值设置为72h。

  1. OBServer 硬件更换场景:建议将该配置项的值设置为4h。

  2. OBServer 清空上线场景:建议将该配置项的值设置为10m,使集群快速上线。

    准备过程

    预备一套环境

    使用OBD工具快速部署一套3节点OB以及一个OBProxy,再创建好一个租户sysbench_tenant,primary_zone为RANDOM。

    注:本文基于OB 3.1.2版本,其他版本需注意另作验证。


    版本 ip
    oceanbase 3.1.2 10.186.64.74
    10.186.64.75
    10.186.64.79
    OBProxy 3.2.3 10.186.60.3

    准备些数据

    使用 sysbench 创建一个表 sbtest1 并插入1W数据。

    sysbench ./oltp_insert.lua --mysql-host=10.186.60.3 --mysql-port=2883 --mysql-db=sysbenchdb --mysql-user="sysbench@sysbench_tenant"  --mysql-password=sysbench --tables=1 --table_size=10000 --threads=1 --time=600 --report-interval=10 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062,5157,4038 prepare<br><br>

    这里改写了 sysbench 的建表语句,分了3个区,查询 sbtest1 表分区副本分布如下

    MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta  inner join __all_tenant tenant  on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab  on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by  tenant.tenant_name,zone, svr_ip,svr_port, 5 order by  tenant.tenant_name, zone, svr_ip, role desc;<br>+-----------------+-------+--------------+----------+----------+---------------+<br>| tenant_name     | zone  | svr_ip       | svr_port | role     | partition_cnt |<br>+-----------------+-------+--------------+----------+----------+---------------+<br>| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | leader   |             1 |<br>| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | follower |             2 |<br>| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | leader   |             1 |<br>| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | follower |             2 |<br>| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | leader   |             1 |<br>| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | follower |             2 |<br>+-----------------+-------+--------------+----------+----------+---------------+<br>

    开始实验

    使用 sysbench 持续写入数据,维持一定的流量,便于在节点重建后对比各节点数据是否一致。

    sysbench ./oltp_insert.lua --mysql-host=10.186.60.3 --mysql-port=2883 --mysql-db=sysbenchdb --mysql-user="sysbench@sysbench_tenant"  --mysql-password=sysbench --tables=1 --table_size=10000 --threads=1 --time=300 --report-interval=10 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062,5157,4038 run<br>

    删除某节点的数据文件

    选择 zone3 下的 10.186.64.79 节点,将数据文件删除。

    [root@localhost data]# rm -rf 1/sstable/block_file<br>[root@localhost data]# cd 1/sstable/<br>[root@localhost sstable]# ll<br>total 0<br>

    永久下线故障节点

  3. 调小参数 server_permanent_offline_time ,缩短节点永久下线时间

    server_permanent_offline_time 默认值为 3600s

    MySQL [oceanbase]> alter system set server_permanent_offline_time='60s';<br>Query OK, 0 rows affected (0.030 sec)<br> <br>MySQL [oceanbase]> SHOW PARAMETERS LIKE "%server_permanent_offline_time%";<br>+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+<br>| zone  | svr_type | svr_ip       | svr_port | name                          | data_type | value | info                                                                                                                              | section      | scope   | source  | edit_level        |<br>+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+<br>| zone3 | observer | 10.186.64.79 |     2882 | server_permanent_offline_time | NULL      | 60s   | the time interval between any two heartbeats beyond which a server is considered to be 'permanently' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |<br>| zone1 | observer | 10.186.64.74 |     2882 | server_permanent_offline_time | NULL      | 60s   | the time interval between any two heartbeats beyond which a server is considered to be 'permanently' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |<br>| zone2 | observer | 10.186.64.75 |     2882 | server_permanent_offline_time | NULL      | 60s   | the time interval between any two heartbeats beyond which a server is considered to be 'permanently' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |<br>+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+<br>
  4. 停止故障节点对外提供服务

    在 kill ob 进程前,建议使用隔离(ISOLATE SERVER)或者停止(STOP SERVER)节点的命令,停掉发往该节点的请求,转移副本 leader 角色。在节点重建恢复后,再开启流量。

    # 停掉79节点服务<br>MySQL [oceanbase]> ALTER SYSTEM STOP SERVER '10.186.64.79:2882' ZONE='zone3';<br>  <br># 或者隔离<br>ALTER SYSTEM ISOLATE SERVER '10.186.64.79:2882' ZONE='zone3';<br>
  5. kill observer进程

    执行 kill -9 $observer_pid ,等待 server_permanent_offline_time 的时间,该ob进入"永久下线”状态。判断ob是否已经永久下线,可以查询表 __all_rootservice_event_history,存在名为 "permanent_offline "的event记录,确认时间和ip都一致后,即可认为ob已经永久下线。

    MySQL [oceanbase]> select * from __all_rootservice_event_history where event='permanent_offline' ;                       <br>+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+<br>| gmt_create                 | module | event             | name1  | value1              | name2 | value2 | name3 | value3 | name4 | value4 | name5 | value5 | name6 | value6 | extra_info | rs_svr_ip    | rs_svr_port |<br>+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+<br>| 2023-03-29 17:34:09.596035 | server | permanent_offline | server | "10.186.64.79:2882" |       |        |       |        |       |        |       |        |       |        |            | 10.186.64.74 |        2882 |<br>+----------------------------+--------+-------------------+--------+---------------------+-------+--------+-------+--------+-------+--------+-------+--------+-------+--------+------------+--------------+-------------+<br>

    查询分区副本分布如下,已不存在79节点的分区副本信息,进一步确认了79节点已永久下线。

    zone2 下的75节点有一个从副本升级为leader角色,此时集群仍然可以继续对外服务。

    MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta  inner join __all_tenant tenant  on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab  on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by  tenant.tenant_name,zone, svr_ip,svr_port, 5 order by  tenant.tenant_name, zone, svr_ip, role desc;<br>+-----------------+-------+--------------+----------+----------+---------------+<br>| tenant_name     | zone  | svr_ip       | svr_port | role     | partition_cnt |<br>+-----------------+-------+--------------+----------+----------+---------------+<br>| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | leader   |             1 |<br>| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | follower |             2 |<br>| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | leader   |             2 |<br>| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | follower |             1 |<br>+-----------------+-------+--------------+----------+----------+---------------+<br>4 rows in set (0.005 sec)<br>

    拉起故障节点,触发数据自动重建

  6. 启动79节点的ob进程,进程启动后会自动触发重建。

    注:防止ob启动失败或存在其他问题,建议启动前将数据文件和事务日志均清空。

    [root@localhost data]# rm -rf log1/clog/*<br>[root@localhost data]# rm -rf log1/ilog/*<br>[root@localhost data]# rm -rf log1/slog/*<br>[root@localhost data]# rm -rf 1/sstable/block_file<br>[root@localhost data]# cd 1/sstable/<br>[root@localhost sstable]# ll<br>total 0<br>[root@localhost sstable]# su admin<br>bash-4.2$ cd /home/admin/ && ./bin/observer<br>./bin/observer<br>

    进程启动后,确认ob心跳恢复状态为active,然后查看分区正在不断补足中

    MySQL [oceanbase]> select svr_ip,zone,with_rootserver,status,stop_time,start_service_time,build_version from __all_server;<br>+--------------+-------+-----------------+--------+------------------+--------------------+----------------------------------------------------------------------------------------+<br>| svr_ip       | zone  | with_rootserver | status | stop_time        | start_service_time | build_version                                                                          |<br>+--------------+-------+-----------------+--------+-----------+---------------------------+----------------------------------------------------------------------------------------+<br>| 10.186.64.74 | zone1 |               1 | active |                0 |   1679984798650860 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |<br>| 10.186.64.75 | zone2 |               0 | active |                0 |   1679984801289281 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |<br>| 10.186.64.79 | zone3 |               0 | active | 1680082329964975 |   1680082511964975 | 3.1.2_10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d(Dec 30 2021 02:47:29) |<br>+--------------+-------+-----------------+--------+------------------+--------------------+----------------------------------------------------------------------------------------+<br>3 rows in set (0.002 sec)<br> <br>MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;<br>+----------+-------+<br>| count(*) | zone  |<br>+----------+-------+<br>|     1322 | zone1 |<br>|     1322 | zone2 |<br>|      152 | zone3 |<br>+----------+-------+<br>3 rows in set (0.228 sec)<br>  <br>MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;<br>+----------+-------+<br>| count(*) | zone  |<br>+----------+-------+<br>|     1322 | zone1 |<br>|     1322 | zone2 |<br>|      664 | zone3 |<br>+----------+-------+<br>3 rows in set (0.113 sec)<br>MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;                                                <br>+----------+-------+<br>| count(*) | zone  |<br>+----------+-------+<br>|     1322 | zone1 |<br>|     1322 | zone2 |<br>|     1179 | zone3 |<br>+----------+-------+<br>3 rows in set (0.112 sec)<br>  <br>MySQL [oceanbase]> select count(*),zone from gv$partition group by zone;<br>+----------+-------+<br>| count(*) | zone  |<br>+----------+-------+<br>|     1322 | zone1 |<br>|     1322 | zone2 |<br>|     1322 | zone3 |<br>+----------+-------+<br>3 rows in set (0.116 sec)<br>

    当3个zone内的分区个数一致后,同时查看zone3已存在副本信息,认为重建完毕。

    由于79节点处于隔离状态,所以还没有leader副本。

    MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta  inner join __all_tenant tenant  on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab  on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by  tenant.tenant_name,zone, svr_ip,svr_port, 5 order by  tenant.tenant_name, zone, svr_ip, role desc;<br>+-----------------+-------+--------------+----------+----------+---------------+<br>| tenant_name     | zone  | svr_ip       | svr_port | role     | partition_cnt |<br>+-----------------+-------+--------------+----------+----------+---------------+<br>| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | leader   |             1 |<br>| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | follower |             2 |<br>| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | leader   |             2 |<br>| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | follower |             1 |<br>| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | follower |             3 |<br>+-----------------+-------+--------------+----------+----------+---------------+<br>6 rows in set (0.005 sec)<br>
  7. 开启故障节点服务

    执行命令解除79节点的隔离状态。

    ALTER SYSTEM START SERVER '10.186.64.79:2882' ZONE='zone3';<br>

    查询分区副本分布如下,leader角色已迁回79节点。

    MySQL [oceanbase]> select tenant.tenant_name, zone, svr_ip,svr_port, case when role=1 then 'leader' when role=2 then 'follower' else NULL end as role, count(1) as partition_cnt from __all_virtual_meta_table meta  inner join __all_tenant tenant  on meta.tenant_id=tenant.tenant_id inner join __all_virtual_table tab  on meta.tenant_id=tab.tenant_id and meta.table_id=tab.table_id where tenant.tenant_id=1001 and tab.table_name='sbtest1' group by  tenant.tenant_name,zone, svr_ip,svr_port, 5 order by  tenant.tenant_name, zone, svr_ip, role desc;<br>+-----------------+-------+--------------+----------+----------+---------------+<br>| tenant_name     | zone  | svr_ip       | svr_port | role     | partition_cnt |<br>+-----------------+-------+--------------+----------+----------+---------------+<br>| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | leader   |             1 |<br>| sysbench_tenant | zone1 | 10.186.64.74 |     2882 | follower |             2 |<br>| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | leader   |             1 |<br>| sysbench_tenant | zone2 | 10.186.64.75 |     2882 | follower |             2 |<br>| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | leader   |             1 |<br>| sysbench_tenant | zone3 | 10.186.64.79 |     2882 | follower |             2 |<br>+-----------------+-------+--------------+----------+----------+---------------+<br>

    3. 把 server_permanent_offline_time 参数恢复为默认值3600s

    MySQL [oceanbase]> alter system set server_permanent_offline_time='3600s';<br>Query OK, 0 rows affected (0.028 sec)<br> <br>MySQL [oceanbase]> SHOW PARAMETERS LIKE "%server_permanent_offline_time%";<br>+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+<br>| zone  | svr_type | svr_ip       | svr_port | name                          | data_type | value | info                                                                                                                              | section      | scope   | source  | edit_level        |<br>+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+<br>| zone2 | observer | 10.186.64.75 |     2882 | server_permanent_offline_time | NULL      | 3600s | the time interval between any two heartbeats beyond which a server is considered to be 'permanently' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |<br>| zone1 | observer | 10.186.64.74 |     2882 | server_permanent_offline_time | NULL      | 3600s | the time interval between any two heartbeats beyond which a server is considered to be 'permanently' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |<br>| zone3 | observer | 10.186.64.79 |     2882 | server_permanent_offline_time | NULL      | 3600s | the time interval between any two heartbeats beyond which a server is considered to be 'permanently' offline. Range: [20s,+∞)   | ROOT_SERVICE | CLUSTER | DEFAULT | DYNAMIC_EFFECTIVE |<br>+-------+----------+--------------+----------+-------------------------------+-----------+-------+-----------------------------------------------------------------------------------------------------------------------------------+--------------+---------+---------+-------------------+<br>3 rows in set (0.007 sec)<br>

    校验各ob节点数据量

    sysbench 已运行结束,直连各 observer ,校验数据量是一致的。

    [root@localhost ~]#  obclient -h10.186.64.74 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench<br>Welcome to the OceanBase.  Commands end with ; or g.<br>Your MySQL connection id is 3221545401<br>Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)<br> <br>Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.<br> <br>Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.<br> <br>MySQL [sysbenchdb]> select count(*) from sbtest1;<br>+----------+<br>| count(*) |<br>+----------+<br>|    53195 |<br>+----------+<br>1 row in set (0.036 sec)<br> <br>MySQL [sysbenchdb]> exit<br>Bye<br>[root@localhost ~]#  obclient -h10.186.64.75 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench<br>Welcome to the OceanBase.  Commands end with ; or g.<br>Your MySQL connection id is 3221823448<br>Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)<br> <br>Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.<br> <br>Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.<br> <br>MySQL [sysbenchdb]> select count(*) from sbtest1;<br>+----------+<br>| count(*) |<br>+----------+<br>|    53195 |<br>+----------+<br>1 row in set (0.040 sec)<br> <br>MySQL [sysbenchdb]> exit<br>Bye<br>[root@localhost ~]#  obclient -h10.186.64.79 -P2881 -usysbench@sysbench_tenant -Dsysbenchdb -A -psysbench<br>Welcome to the OceanBase.  Commands end with ; or g.<br>Your MySQL connection id is 3222011907<br>Server version: 5.7.25 OceanBase 3.1.2 (r10000392021123010-d4ace121deae5b81d8f0b40afbc4c02705b7fc1d) (Built Dec 30 2021 02:47:29)<br> <br>Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.<br> <br>Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.<br> <br>MySQL [sysbenchdb]> select count(*) from sbtest1;<br>+----------+<br>| count(*) |<br>+----------+<br>|    53195 |<br>+----------+<br>1 row in set (0.037 sec)<br> <br>MySQL [sysbenchdb]><br>

    总结

    数据文件损坏或者丢失时,可通过调整参数 server_permanent_offline_time 来重建受影响的节点。

    1. 设小 server_permanent_offline_time 阈值。

    1. 停止故障节点对外服务。

    2. 终止该节点进程。

    3. 超过阈值后,节点将被标记为永久下线,系统会自动清空副本以及向同zone内其他节点迁移数据。

    4. 启动 OB 进程,自动触发重建节点数据。

    5. 开启故障节点服务。

    7. 把 server_permanent_offline_time 参数改回原来的值。

             本文关键字 
             :#OceanBase# #重建节点# 
    
          <h5 data-tool="mdnice编辑器"></h5> 

    文章推荐:

    新特性解读 | MySQL 8.0.28 用户连接内存限制

    新特性解读 | MySQL 8.0 窗口函数一次疑问解答

    新特性解读 | MySQL 8.0 新密码策略(下)

    新特性解读 | MySQL 8.0 新密码策略(中)

    关于SQLE

    爱可生开源社区的 SQLE 是一款面向数据库使用者和管理者,支持多场景审核,支持标准化上线流程,原生支持 MySQL 审核且数据库类型可扩展的 SQL 审核工具。

    SQLE 获取

    类型 地址
    版本库 https://github.com/actiontech/sqle
    文档 https://actiontech.github.io/sqle-docs-cn/
    发布信息 https://github.com/actiontech/sqle/releases
    数据审核插件开发文档 https://actiontech.github.io/sqle-docs-cn/3.modules/3.7_auditplugin/auditplugin_development.html

    提交有效pr,高质量issue,将获赠面值200-500元(具体面额依据质量而定)京东卡以及爱可生开源社区精美周边!

    更多关于 SQLE 的信息和交流,请加入官方QQ交流群:637150065...

本文分享自微信公众号 - 爱可生开源社区(ActiontechOSS)。
如有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一起分享。