Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

351

积分

0

好友

8

主题
1#
发表于 2012-12-12 15:03:57 | 查看: 5587| 回复: 4
这是重启节点的操作系统日志:

Dec 12 11:00:48 GIO138 MR_MONITOR[10507]: <MRMON044> Controller ID: 0  Time established since power on: Time 2012-12-12,11:00:48 72738 Seconds
Dec 12 12:00:48 GIO138 MR_MONITOR[10507]: <MRMON044> Controller ID: 0  Time established since power on: Time 2012-12-12,12:00:48 76338 Seconds
Dec 12 13:00:48 GIO138 MR_MONITOR[10507]: <MRMON044> Controller ID: 0  Time established since power on: Time 2012-12-12,13:00:48 79937 Seconds
Dec 12 14:00:48 GIO138 MR_MONITOR[10507]: <MRMON044> Controller ID: 0  Time established since power on: Time 2012-12-12,14:00:48 83537 Seconds
Dec 12 14:28:10 GIO138 syslogd 1.4.1: restart.
Dec 12 14:28:10 GIO138 syslog: syslogd startup succeeded
Dec 12 14:28:10 GIO138 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 12 14:28:10 GIO138 kernel: Bootdata ok (command line is ro root=LABEL=/ rhgb quiet)
Dec 12 14:28:10 GIO138 kernel: Linux version 2.6.9-78.ELlargesmp (brewbuilder@ls20-bc2-14.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Jul 9 16:03:59 EDT 2008
Dec 12 14:28:10 GIO138 kernel: BIOS-provided physical RAM map:
Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 00000000000ca000 - 0000000000100000 (reserved)
Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 0000000000100000 - 000000007ffe0000 (usable)
Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 000000007ffe0000 - 000000007ffea000 (ACPI data)
Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 000000007ffea000 - 0000000080000000 (ACPI NVS)
Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 0000000080000000 - 00000000cfc00000 (usable)

这是另外一个节点(也就是集群主节点)的cssd.log:

[    CSSD]2012-12-12 14:25:53.608 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 50 3.119222e-317artbeat fatal, eviction in 29.770 seconds
[    CSSD]2012-12-12 14:25:53.608 [1241577824] >TRACE:   clssnmPollingThread: node gio138 (2) is impending reconfig, flag 1037, misstime 30230
[    CSSD]2012-12-12 14:25:53.608 [1241577824] >TRACE:   clssnmPollingThread: diskTimeout set to (57000)ms impending reconfig status(1)
[    CSSD]2012-12-12 14:25:54.600 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 50 3.119246e-317artbeat fatal, eviction in 28.780 seconds
[    CSSD]2012-12-12 14:26:08.606 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 75 3.119270e-317artbeat fatal, eviction in 14.780 seconds
[    CSSD]2012-12-12 14:26:09.608 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 75 3.119293e-317artbeat fatal, eviction in 13.770 seconds
[    CSSD]2012-12-12 14:26:17.603 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 90 3.119317e-317artbeat fatal, eviction in 5.780 seconds
[    CSSD]2012-12-12 14:26:18.604 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 90 3.119341e-317artbeat fatal, eviction in 4.780 seconds
[    CSSD]2012-12-12 14:26:19.606 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 90 3.119364e-317artbeat fatal, eviction in 3.770 seconds
[    CSSD]2012-12-12 14:26:20.598 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 90 3.119388e-317artbeat fatal, eviction in 2.780 seconds
[    CSSD]2012-12-12 14:26:21.600 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 90 3.119412e-317artbeat fatal, eviction in 1.780 seconds
[    CSSD]2012-12-12 14:26:22.602 [1241577824] >WARNING: clssnmPollingThread: node gio138 (2) at 90 3.119436e-317artbeat fatal, eviction in 0.780 seconds

由于主节点的时间比另外一个节点会慢两分钟,所以从日志很难看出是系统异常导致心跳报错还是心跳异常导致系统重启的。

可以看出系统是异常重启的,重启之前没有记录任何异常日志。

问题:
1.假如是心跳异常导致的重启节点那系统是正常关闭还是异常关闭的?此时节点的系统日志应该会记录集群等关闭的信息吧?

2.怎么判断是因为系统异常重启导致的心跳报错,还是因为心跳错误导致系统异常重启的?

3.假如是心跳问题怎么判断是不是由于网络问题造成的,因为经常会出现重启,而且网线和交换机也换过了,不可能新的网络设备也有问题吧,另外根据sar显示重启之前两个节点负载都比较低,cpu空闲都有85%以上。
2#
发表于 2012-12-12 15:08:46
什么版本 什么操作系统 都不说!!

请上传完整的os log 例如 /var/log/messages

回复 只看该作者 道具 举报

3#
发表于 2012-12-12 15:10:37
本帖最后由 gdpr-dba 于 2012-12-12 15:13 编辑

不好意思,忘记说了

数据库版本 10.2.0.4 rac
操作系统   RH5.5

syslog.zip

38.25 KB, 下载次数: 785

回复 只看该作者 道具 举报

4#
发表于 2012-12-12 15:33:28
  1. Dec 12 12:00:48 GIO138 MR_MONITOR[10507]: <MRMON044> Controller ID: 0  Time established since power on: Time 2012-12-12,12:00:48 76338 Seconds
  2. Dec 12 13:00:48 GIO138 MR_MONITOR[10507]: <MRMON044> Controller ID: 0  Time established since power on: Time 2012-12-12,13:00:48 79937 Seconds
  3. Dec 12 14:00:48 GIO138 MR_MONITOR[10507]: <MRMON044> Controller ID: 0  Time established since power on: Time 2012-12-12,14:00:48 83537 Seconds
  4. Dec 12 14:28:10 GIO138 syslogd 1.4.1: restart.
  5. Dec 12 14:28:10 GIO138 syslog: syslogd startup succeeded
  6. Dec 12 14:28:10 GIO138 kernel: klogd 1.4.1, log source = /proc/kmsg started.
  7. Dec 12 14:28:10 GIO138 kernel: Bootdata ok (command line is ro root=LABEL=/ rhgb quiet)
  8. Dec 12 14:28:10 GIO138 kernel: Linux version 2.6.9-78.ELlargesmp (brewbuilder@ls20-bc2-14.build.redhat.com) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-10)) #1 SMP Wed Jul 9 16:03:59 EDT 2008
  9. Dec 12 14:28:10 GIO138 kernel: BIOS-provided physical RAM map:
  10. Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
  11. Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
  12. Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 00000000000ca000 - 0000000000100000 (reserved)
  13. Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 0000000000100000 - 000000007ffe0000 (usable)
  14. Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 000000007ffe0000 - 000000007ffea000 (ACPI data)
  15. Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 000000007ffea000 - 0000000080000000 (ACPI NVS)
  16. Dec 12 14:28:10 GIO138 kernel:  BIOS-e820: 0000000080000000 - 00000000cfc00000 (usable)
复制代码
把/etc/oracle目录下的日志 打包上传, 就oslog 看似乎不是cluterware直接要求的重启, 因为如果是CRS要求重启OSLOG中应当有记录

回复 只看该作者 道具 举报

5#
发表于 2012-12-12 23:12:59
Liu Maclean(刘相兵 发表于 2012-12-12 15:33
把/etc/oracle目录下的日志 打包上传, 就oslog 看似乎不是cluterware直接要求的重启, 因为如果是CRS要求 ...

谢谢刘大,附件已上传。

oprocd.zip

32.95 KB, 下载次数: 785

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-11-16 04:23 , Processed in 0.051540 second(s), 23 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569