其他分享
首页 > 其他分享> > 操作系统crash分析grid集群重启原因

操作系统crash分析grid集群重启原因

作者:互联网

默认情况下linux是不能分析core文件需要安装内核调试和crash分析工具

从以下网址安装内核调试rpm和 crash
https://oss.oracle.com/ol7/debuginfo/
kernel-uek-debuginfo-4.14.35-1902.3.2.el7uek.x86_64.rpm
kernel-uek-debuginfo-common-4.14.35-1902.3.2.el7uek.x86_64.rpm
yum install crash

安装完毕后检查

[root@ht02 ~]# rpm -qa|grep kernel-uek-debuginfo
kernel-uek-debuginfo-common-4.14.35-1902.3.2.el7uek.x86_64
kernel-uek-debuginfo-4.14.35-1902.3.2.el7uek.x86_64
[root@ht02 ~]# uname -r
4.14.35-1902.3.2.el7uek.x86_64
[root@ht02 ~]# rpm -qa|grep crash
crash-7.2.3-10.el7.x86_64

 

19c设置cssd、cssdmoniter属性当grid驱逐或者crash时,操作系统生成core文件

开启crash dump

/u01/app/grid/bin/crsctl modify type ora.cssd.type -attr "ATTRIBUTE=REBOOT_OPTS, TYPE=string, DEFAULT_VALUE=,FLAGS=CONFIG" -init
/u01/app/grid/bin/crsctl modify type ora.cssdmonitor.type -attr "ATTRIBUTE=REBOOT_OPTS,TYPE=string, DEFAULT_VALUE=,FLAGS=CONFIG" -init
/u01/app/grid/bin/crsctl modify res ora.cssd -attr "REBOOT_OPTS=CRASHDUMP" -init
/u01/app/grid/bin/crsctl modify res ora.cssdmonitor -attr "REBOOT_OPTS=CRASHDUMP" -init


关闭 crash dump

/u01/app/grid/bin/crsctl modify res ora.cssd -attr "REBOOT_OPTS=" -init
/u01/app/grid/bin/crsctl modify res ora.cssdmonitor -attr "REBOOT_OPTS=" -init

 

11g开启crash dump 参考mosPre-11.2: Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions (Doc ID 559365.1)
[+ASM1]@ht01[/home/grid]$crsctl get css diagwait
CRS-4678: Successful get diagwait 0 for Cluster Synchronization Services.
[root@ht01 ~]# /u01/app/grid/bin/crsctl set css diagwait 13
CRS-4684: Successful set of parameter diagwait to 13 for Cluster Synchronization Services.
[+ASM1]@ht01[/home/grid]$crsctl get css diagwait
CRS-4678: Successful get diagwait 13 for Cluster Synchronization Services
11g关闭 crash dump
crsctl unset css diagwait -force

kill ocssd.bin 进程   cssdmonitor导致操作系统自动重启

[root@ht02 ~]# crash /lib/debug/lib/modules/4.14.35-1902.3.2.el7uek.x86_64/vmlinux /var/crash/127.0.0.1-2022-06-23-05:27:58/vmcore

crash 7.2.3-10.el7
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

WARNING: kernel relocated [752MB]: patching 90846 gdb minimal_symbol values

please wait... (patching 90846 gdb minimal_symbol values)
KERNEL: /lib/debug/lib/modules/4.14.35-1902.3.2.el7uek.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2022-06-23-05:27:58/vmcore [PARTIAL DUMP]
CPUS: 4
DATE: Thu Jun 23 17:27:50 2022
UPTIME: 00:10:25
LOAD AVERAGE: 1.51, 1.43, 0.84
TASKS: 769
NODENAME: ht02
RELEASE: 4.14.35-1902.3.2.el7uek.x86_64
VERSION: #2 SMP Tue Jul 30 03:59:02 GMT 2019
MACHINE: x86_64 (3194 Mhz)
MEMORY: 14.6 GB
PANIC: "sysrq: SysRq : Trigger a crash"
PID: 3405
COMMAND: "cssdmonitor"
TASK: ffff96f176ddaf80 [THREAD_INFO: ffff96f176ddaf80]
CPU: 1
STATE: TASK_RUNNING (SYSRQ)

 查看ohasd_orarootagent_root.trc

2022-06-23 17:27:50.559 : CSSCLNT:3548346112: clsssRecvMsgA: got a disconnect from the server while waiting for message type 27
2022-06-23 17:27:50.559 :GIPCXCPT:3548346112:  gipcInternalSend: connection not valid for send operation endp 0x7f78b40811b0 [00000000000006de] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=dd87820c-fc4df1b5-3382))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ht02_)(GIPCID=fc4df1b5-dd87820c-3433))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 3433, readyRef (nil), ready 0, wobj 0x7f78b406c760, sendp (nil) status 0flags 0x2003861e, flags-2 0x0, usrFlags 0x20010 }, ret gipcretConnectionLost (12)
2022-06-23 17:27:50.559 :GIPCXCPT:3548346112:  gipcSendSyncF [clsssServerRPC_int : clsss.c : 8292]: EXCEPTION[ ret gipcretConnectionLost (12) ]  failed to send on endp 0x7f78b40811b0 [00000000000006de] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=dd87820c-fc4df1b5-3382))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ht02_)(GIPCID=fc4df1b5-dd87820c-3433))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 3433, readyRef (nil), ready 0, wobj 0x7f78b406c760, sendp (nil) status 0flags 0x2003861e, flags-2 0x0, usrFlags 0x20010 }, addr 0000000000000000, buf 0x7f78d37eb6f8, len 80, flags 0x8000000
2022-06-23 17:27:50.559 : CSSCLNT:3548346112: clsssServerRPC: send failed with err 12, msg type 7

2022-06-23 17:27:50.559 : CSSCLNT:3548346112: clsssCommonClientExit: RPC failure, rc 3

2022-06-23 17:27:50.559 : USRTHRD:4038760192: [     INFO]  clsnpoll_BlockMsg: lost connection with CSS
2022-06-23 17:27:50.559 : USRTHRD:4038760192: [     INFO]  clsnpoll_BlockMsg: calling sync
Trace file /u01/app/11.2.0/grid/diag/crs/ht02/crs/trace/ohasd_cssdmonitor_root.trc
Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
Version 19.3.0.0.0 Copyright 1996, 2019 Oracle. All rights reserved.
    CLSB:429202688: [     INFO] Argument count (argc) for this daemon is 1
    CLSB:429202688: [     INFO] Argument 0 is: /u01/app/grid/bin/cssdmonitor

  

标签:27,crash,操作系统,23,grid,2022,x86
来源: https://www.cnblogs.com/omsql/p/16415216.html