- 最后登录
- 2014-10-22
- 在线时间
- 47 小时
- 威望
- 0
- 金钱
- 270
- 注册时间
- 2011-10-13
- 阅读权限
- 10
- 帖子
- 108
- 精华
- 0
- 积分
- 0
- UID
- 18
|
13#
发表于 2014-8-8 23:24:35
OSW的 ps里面 :
zzz ***二 8月 5 02:14:06 CST 2014
USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STIME TIME COMMAND
oracle 26138 1 59 0.0 10.0 84094312 52627624 602fdeae5dc S 8月_03 00:05 ora_q003_ards1
oracle 25780 1 59 0.0 10.0 84092584 52628192 602fdeae52c S 16:12:10 00:01 ora_q005_ards1
oracle 17502 1 59 0.0 10.0 84091816 52627888 602fdeae4bc S 02:07:35 00:00 ora_w000_ards1
oracle 3351 1 59 0.0 10.0 84093808 52619072 602fdeae58c S 7月_31 00:23 ora_smco_ards1
zzz ***二 8月 5 02:15:13 CST 2014
USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STIME TIME COMMAND
到这里后无数据,但是其他的命令例如 mpstat之类的在35分之后才没数据的
message里面也是在14分时候出现信息:
Aug 5 02:14:47 odsdb-node01 in.mpathd[202]: [ID 585766 daemon.error] Cannot meet requested failure detection time of 10000 ms on (inet igb0) new failure detection time for group "public" is 123858 ms
然后10多分钟后,数据库报错进程无法启动,可能实例只是受害者。。
然后系统挂了。
MOS有这样一个,但是因为对SOLARIS不精通,不敢确认,建议找主机人员介入。
Solaris 10 Server or Solaris Cluster Node/System/Server Rebooted Itself With Panic String "forced crash dump initiated at user request" (文档 ID 1388823.1)
The Solaris system crashes with "forced crash dump initiated at user request".
The panic message looks like:
node1 ^Mpanic[cpu1]/thread=2a10007fca0:
node1 unix: [ID 156897 kern.notice] forced crash dump initiated at user request
node1 unix: [ID 100000 kern.notice]
node1 genunix: [ID 723222 kern.notice] 000002a10007f700 genunix:kadmin+544 (b4, 1, 0, 0, 5, 1285800)
node1 genunix: [ID 179002 kern.notice] %l0-3: 0000000001815400 00000000011ee684 00000000011ee400 0000000000000004
node1 %l4-7: 0000000000000004 00000000000004e8 0000000000000004 0000000000000004
node1 genunix: [ID 723222 kern.notice] 000002a10007f7c0 ntwdt:ntwdt_enforce_timeout+70 (6001153bb30, 60011694370, 7005d800, 18c0400, 7b7d1c00, 0)
node1 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000800 00000000018d2c00 0000000000000000 0000000001913c00
node1 %l4-7: 0000060011694370 0000000000000001 000000007b7d1c00 0000000000000001
node1 pcisch: [ID 569400 kern.warning] WARNING: pcisch2: ino 0x22 blocked
node1 pcisch: [ID 548919 kern.info] rmc_comm-0#0
Cause
There is a Solaris driver called ntwdt (Netra-based application watchdog timer driver). It is a watchdog mechanism. It detects a system hang, or an application hang or crash. The watchdog is a timer that is continually reset by a user application as long as the operating system and user application are running. When the application stops and the timeout is reached ntwdt driver will generate a coredump and crashes the system. The man pages can be found there :ntwdt(7D)
Solution
If you see ntwdt e.g:
ntwdt:ntwdt_enforce_timeout+0x70(0x600368d67b0)
ntwdt:ntwdt_cyclic_pat+0x90(0x600368d67b0)
in panic stack then these are obvious hardware errors.
Check /var/adm/messages to get a clue what the real problem is.
Also the logs of the service processor of the system can be helpful.
If you unable to determine the cause contact Oracle Support provide crashdump and explorer output of the system for analysis.
|
|