Oracle数据库4031故障分析报告
点击上方“IT那活儿”公众号--专注于企业全栈运维技术分享,不管IT什么活儿,干就完了!!!
通过查看问题前的ash,发现在数据库异常之前,数据库的主要等待事件是:
SGA: allocation forcing component growth。Alert.log:<br>Errors in file u01/app/oracle/diag/rdbms/*/trace/db2_ora_6519.trc<br>ORA-04031: unable to allocate 63496 bytes of shared memory ("shared pool","DBMS_BACKUP_RESTORE","PLMCD^a5f238a0","BAMIMA: Bam Buffer")<br>Incident details in: /u01/app/oracle/diag/rdbms/*/incident/incdir_218692/db2_ora_6519_i218692.trc<br>Use ADRCI or Support Workbench to package the incident.<br>See Note 411.1 at My Oracle Support for error and packaging details.<br>2022-10-03T00:16:41.321084+08:00<br>ORA-04031 heap dump being written to trace file /u01/app/oracle/diag/rdbms/*/incident/incdir_218692/db2_ora_6519_i218692.trc<br>2022-10-03T00:19:38.948488+08:00<br>Errors in file /u01/app/oracle/diag/rdbms/*/trace/b2_m000_8030.trc (incident=218693):<br>ORA-04031: unable to allocate 57768 bytes of shared memory ("shared pool","DBMS_STATS_INTERNAL","PLMCD^60dfd26c","BAMIMA: Bam Buffer")<br>Incident details in: /u01/app/oracle/diag/rdbms/*/incident/incdir_218693/b2_m000_8030_i218693.trc<br>Use ADRCI or Support Workbench to package the incident.<br>See Note 411.1 at My Oracle Support for error and packaging details.<br>2022-10-03T00:19:42.233916+08:00<br>Errors in file /u01/app/oracle/diag/rdbms/*/trace/db2_ora_8166.trc (incident=218700):<br>ORA-04031: unable to allocate 63496 bytes of shared memory ("shared pool","DBMS_BACKUP_RESTORE","PLMCD^a5f238a0","BAMIMA: Bam Buffer")<br>Incident details in: /u01/app/oracle/diag/rdbms/*/incident/incdir_218700/db2_ora_8166_i218700.trc<br>Use ADRCI or Support Workbench to package the incident.<br>See Note 411.1 at My Oracle Support for error and packaging details.<br>…<br>2022-10-03T04:23:42.937853+08:00<br>Errors in file /u01/app/oracle/diag/rdbms/*/trace/db2_lmd0_7330.trc (incident=215852):<br>ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic")<br>Use ADRCI or Support Workbench to package the incident.<br>See Note 411.1 at My Oracle Support for error and packaging details.<br>2022-10-03T04:23:49.345977+08:00<br>Errors in file /u01/app/oracle/diag/rdbms/*/trace/db2_lmd0_7330.trc (incident=215853):<br>ORA-04031: unable to allocate 13840 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges resource dynamic")<br>Use ADRCI or Support Workbench to package the incident.<br>See Note 411.1 at My Oracle Support for error and packaging details.<br>…<br>2022-10-03T08:58:07.793554+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br>Further messages for this problem key will be suppressed for up to 10 minutes<br>2022-10-03T09:08:09.957747+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br>Further messages for this problem key will be suppressed for up to 10 minutes<br>2022-10-03T09:18:12.005452+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br>Further messages for this problem key will be suppressed for up to 10 minutes<br>2022-10-03T09:28:12.835118+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br><br>022-10-03T09:38:13.206769+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br>Further messages for this problem key will be suppressed for up to 10 minutes<br>2022-10-03T09:44:00.615729+08:00<br>Thread 2 advanced to log sequence 29428 (LGWR switch)<br>Current log# 8 seq# 29428 mem# 0: +DG_DATA_SSD_1/N*/ONLINELOG/group_8.276.1008547021<br>Current log# 8 seq# 29428 mem# 1: +DG_DATA_SSD_1/N*/ONLINELOG/group_8.277.1008547025<br>2022-10-03T09:44:01.334950+08:00<br>TT03: Standby redo logfile selected for thread 2 sequence 29428 for destination LOG_ARCHIVE_DEST_2<br>2022-10-03T09:44:03.799630+08:00<br>Archived Log entry 119834 added for T-2.S-29427 ID 0x79a6de54 LAD:1<br>2022-10-03T09:48:17.418439+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br>Further messages for this problem key will be suppressed for up to 10 minutes<br>2022-10-03T09:58:19.513479+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br>Further messages for this problem key will be suppressed for up to 10 minutes<br>2022-10-03T10:08:21.022510+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br>Further messages for this problem key will be suppressed for up to 10 minutes<br>2022-10-03T10:14:50.702688+08:00<br>LCK1 (ospid: 7342) waits for event libcache interrupt action by LCK for 73 secs.<br>2022-10-03T10:14:50.702867+08:00<br>LCK1 (ospid: 7342) is hung in an acceptable location (libcache 0x41.02).<br>2022-10-03T10:17:11.454316+08:00<br>LCK1 (ospid: 7342) waits for event libcache interrupt action by LCK for 214 secs.<br>2022-10-03T10:17:11.454494+08:00<br>LCK1 (ospid: 7342) is hung in an acceptable location (libcache 0x41.02).<br>2022-10-03T10:18:23.645094+08:00<br>DDE: Problem Key ORA 4031 was completely flood controlled (0x6)<br>Further messages for this problem key will be suppressed for up to 10 minutes<br>2022-10-03T10:21:42.845553+08:00<br>LCK1 (ospid: 7342) waits for event libcache interrupt action by LCK for 79 secs.<br>2022-10-03T10:21:42.845731+08:00<br>LCK1 (ospid: 7342) is hung in an acceptable location (libcache 0x41.02).<br>2022-10-03T10:24:20.692124+08:00<br>LCK1 (ospid: 7342) waits for event libcache interrupt action by LCK for 80 secs.<br>2022-10-03T10:24:20.692681+08:00<br>LCK1 (ospid: 7342) is hung in an acceptable location (libcache 0x41.02).<br>2022-10-03T10:27:57.458418+08:00<br>CKPT (ospid: 7350) waits for event enq: XR - database force logging for 78 secs.<br>2022-10-03T10:28:07.628285+08:00<br>CKPT (ospid: 7350) waits for event enq: XR - database force logging for 88 secs.<br>2022-10-03T10:28:17.668429+08:00<br>CKPT (ospid: 7350) waits for event enq: XR - database force logging for 98 secs.
通过ash趋势和alert.log发现,数据库在出现enq: TX - index contention之前,数据库主要等待是SGA: allocation forcing component growth和library cache lock,并且alert.log从0点开始出现ora-4031持续到异常时段。查看数据库内存使用情况:Buffer cache的内存非常小,只有128m,而shared pool占用了32G。
Shared pool中ges esource dynamic和ges enqueues分别占用了12G和8.9G,这通常是不正常的。说明由于这两个组件占用了过多的内存,导致数据库的ORA-4031,进而出来sga resize,进而影响了数据库的整体性能。分析lmhb.trc:
解决方案
For Oracle Versions >= 12.2 but BELOW 19.1
通过重启实例后,ges resource dynamic空间释放,但ges enqueues还有8g,目前看已经缓解问题。最终建议通过补丁解决。