Oracle数据库数据恢复、性能优化

找回密码
注册
搜索
热搜: 活动 交友 discuz
发新帖

31

积分

0

好友

0

主题
1#
发表于 2012-2-1 14:43:29 | 查看: 17067| 回复: 24
系统环境:suse 10 sp2
数据库版本:10.2.0.4.0
存储:raw+asm
故障:节点一crs无法启动,节点二正常
RAC1:/oracle_crs/product/10.2/crs/bin # ./olsnodes -n
rac1    1
rac2    2
RAC2:/oracle_crs/product/10.2/crs/bin # ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.orcl.db    application    ONLINE    ONLINE    rac2        
ora....l1.inst application    ONLINE    OFFLINE               
ora....l2.inst application    ONLINE    ONLINE    rac2        
ora....SM1.asm application    ONLINE    OFFLINE               
ora....C1.lsnr application    ONLINE    OFFLINE               
ora.rac1.gsd   application    ONLINE    OFFLINE               
ora.rac1.ons   application    ONLINE    OFFLINE               
ora.rac1.vip   application    ONLINE    ONLINE    rac2        
ora....SM2.asm application    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    ONLINE    ONLINE    rac2        
ora.rac2.vip   application    ONLINE    ONLINE    rac2        
hosts配置;
节点一:
RAC1:~ # cat /etc/hosts | grep -v ^#|grep -v ^$
127.0.0.1       localhost
::1             localhost ipv6-localhost ipv6-loopback
fe00::0         ipv6-localnet
ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts
192.168.1.23    RAC1
192.168.1.25    RAC2
10.10.10.1      RAC1-priv
10.10.10.2      RAC2-priv
192.168.1.24    RAC1-vip
192.168.1.26    RAC2-vip
节点二:
RAC2:~ # cat /etc/hosts| grep -v ^#|grep -v ^$
127.0.0.1       localhost
::1             localhost ipv6-localhost ipv6-loopback
fe00::0         ipv6-localnet
ff00::0         ipv6-mcastprefix
ff02::1         ipv6-allnodes
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts
192.168.1.23    RAC1
192.168.1.25    RAC2
10.10.10.1      RAC1-priv
10.10.10.2      RAC2-priv
192.168.1.24    RAC1-vip
192.168.1.26    RAC2-vip
网络
节点一:
RAC1:/oracle_crs/product/10.2/crs/bin # ./oifcfg getif
eth0  192.168.1.0  global  public
eth1  10.10.10.0  global  cluster_interconnect
节点二:
RAC2:/oracle_crs/product/10.2/crs/bin # ./oifcfg getif
eth0  192.168.1.0  global  public
eth1  10.10.10.0  global  cluster_interconnect
存储:
节点一:
RAC1:~ # ll /dev/raw/
total 0
crw-rw---- 1 root   oinstall 162, 1 Feb  1 10:55 raw1
crw-rw---- 1 root   oinstall 162, 2 Feb  1 10:55 raw2
crw-rw---- 1 oracle oinstall 162, 3 Feb  1 14:17 raw3
crw-rw---- 1 oracle oinstall 162, 4 Feb  1 14:17 raw4
crw-rw---- 1 oracle oinstall 162, 5 Feb  1 14:17 raw5
节点二:
RAC2:~ # ll /dev/raw/
total 0
crw-rw---- 1 root   oinstall 162, 1 Feb  1 14:17 raw1
crw-rw---- 1 root   oinstall 162, 2 Feb  1 14:17 raw2
crw-rw---- 1 oracle oinstall 162, 3 Feb  1 14:17 raw3
crw-rw---- 1 oracle oinstall 162, 4 Feb  1 14:17 raw4
crw-rw---- 1 oracle oinstall 162, 5 Feb  1 14:17 raw5
ocr信息:
RAC1:/oracle_crs/product/10.2/crs/bin # ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     511264
         Used space (kbytes)      :       3848
         Available space (kbytes) :     507416
         ID                       : 1274702016
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw2
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded
RAC2:/oracle_crs/product/10.2/crs/bin # ./ocrcheck
Status of Oracle Cluster Registry is as follows :
         Version                  :          2
         Total space (kbytes)     :     511264
         Used space (kbytes)      :       3848
         Available space (kbytes) :     507416
         ID                       : 1274702016
         Device/File Name         : /dev/raw/raw1
                                    Device/File integrity check succeeded
         Device/File Name         : /dev/raw/raw2
                                    Device/File integrity check succeeded
         Cluster registry integrity check succeeded
节点一在断电后,重启服务器在启crs启动不起来。
附件为日志信息

log.zip

459.21 KB, 下载次数: 1484

2#
发表于 2012-2-1 14:59:31
从日志看CSSD正在启动:
priority string (4)
setsid: failed with -1/1
s0clssscGetEnvOracleUser: calling getpwnam_r for user oracle
s0clssscGetEnvOracleUser: info for user oracle complete
2012-02-01 10:55 CSSD starting
=================================
2012-02-01 10:55:47.542: [ CSSCLNT][3064780464]clsssInitNative: connect failed, rc 9

2012-02-01 10:55:47.543: [  CRSRTI][3064780464]0CSS is not ready. Received status 3 from CSS. Waiting for good status ..

是不是你心太极等待时间不够啊?
你这样试试,把RAC2节点停止服务然后关机---》单独启动RAC1节点,看看效果如何,操作系统起来之后大概15分钟左右内CRS应该能启动完毕,如果确实不能启动再查日志;如果RAC1节点启动正常,再开RAC2机器,等待,估计问题不大。

回复 只看该作者 道具 举报

3#
发表于 2012-2-1 15:01:02
附件中没有节点1的ocssd.log日志,有的话上传一份吧!
另:
1、节点1的/tmp下是否有crsctl*类的日志文件?
2、节点1网络没问题吧?

回复 只看该作者 道具 举报

4#
发表于 2012-2-1 15:03:14
原帖由 javaio 于 2012-2-1 14:59 发表
从日志看CSSD正在启动:
priority string (4)
setsid: failed with -1/1
s0clssscGetEnvOracleUser: calling getpwnam_r for user oracle
s0clssscGetEnvOracleUser: info for user oracle complete
2012-02-01 10:55  ...



在一年前出现过这个问题,,我是把节点2关了在重启节1,然后在重启节点2,,但是现在业务太多了,那样操作影响太大了,而且客户也不同意

回复 只看该作者 道具 举报

5#
发表于 2012-2-1 15:05:32

回复 3# 的帖子

1.RAC1:/tmp # ll crsctl*
/bin/ls: crsctl*: No such file or directory

2.网络没问题

回复 只看该作者 道具 举报

6#
发表于 2012-2-1 15:10:16
不好意思,补上节点1 ocssd.log

ocssd.zip

1.01 MB, 下载次数: 1576

回复 只看该作者 道具 举报

7#
发表于 2012-2-1 15:10:42
crsctl query css votedisk
仲裁盘是否正常呢?

回复 只看该作者 道具 举报

8#
发表于 2012-2-1 15:13:55
正常
RAC1:/oracle_crs/product/10.2/crs/bin # ./crsctl query css votedisk
0.     0    /dev/raw/raw3
1.     0    /dev/raw/raw4
2.     0    /dev/raw/raw5
located 3 votedisk(s).


RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl query css votedisk
0.     0    /dev/raw/raw3
1.     0    /dev/raw/raw4
2.     0    /dev/raw/raw5
located 3 votedisk(s).

回复 只看该作者 道具 举报

9#
发表于 2012-2-1 15:50:43
原帖由 qingfrog 于 2012-2-1 15:10 发表
不好意思,补上节点1 ocssd.log


网络没问题,不只是ping的时候!如果你单节点都能启动的话!
防火墙有问题没?iptables status?

回复 只看该作者 道具 举报

10#
发表于 2012-2-1 15:59:23
另:ocssd.log日志不全!有没有出错时间点左右的!
还有,比如route等!

回复 只看该作者 道具 举报

11#
发表于 2012-2-1 16:13:15
1.防火墙都关闭了的
2.ocssd.log是服务器上完整的
3.route
RAC1:~ # route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.1.0     *               255.255.255.0   U     0      0        0 eth0
10.10.10.0      *               255.255.255.0   U     0      0        0 eth1
link-local      *               255.255.0.0     U     0      0        0 eth0
loopback        *               255.0.0.0       U     0      0        0 lo
default         192.168.1.254   0.0.0.0         UG    0      0        0 eth0

RAC2:~ # route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
192.168.1.0     *               255.255.255.0   U     0      0        0 eth0
10.10.10.0      *               255.255.255.0   U     0      0        0 eth1
link-local      *               255.255.0.0     U     0      0        0 eth0
loopback        *               255.0.0.0       U     0      0        0 lo
default         192.168.1.254   0.0.0.0         UG    0      0        0 eth0

RAC1:~ # netstat -in
Kernel Interface table
Iface   MTU Met   RX-OK RX-ERR RX-DRP RX-OVR   TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0   1500   0   22524      0      0   4039  123254      0      0      0 BMRU
eth1   1500   0    2238      0      0   2238      41      0      0      0 BMRU
lo    16436   0   19332      0      0      0   19332      0      0      0 LRU

RAC2:~ # netstat -in
Kernel Interface table
Iface   MTU Met   RX-OK RX-ERR RX-DRP RX-OVR   TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0    1500   0 4140173      0      0 167670 2915652      0      0      0 BMRU
eth0:1  1500   0     - no statistics available -                        BMRU
eth0:2  1500   0     - no statistics available -                        BMRU
eth1    1500   0  110416      0      0 110265    1177      0      0      0 BMRU
lo     16436   0 2333487      0      0      0 2333487      0      0      0 LRU

RAC2:~ # ifconfig
eth0      Link encap:Ethernet  HWaddr 00:14:5E:F4:DE:FA  
          inet addr:192.168.1.25  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::214:5eff:fef4:defa/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:4140610 errors:0 dropped:167680 overruns:0 frame:0
          TX packets:2915997 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:995674691 (949.5 Mb)  TX bytes:1086094172 (1035.7 Mb)
          Interrupt:185 Memory:e4000000-e4012100
eth0:1    Link encap:Ethernet  HWaddr 00:14:5E:F4:DE:FA  
          inet addr:192.168.1.26  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:185 Memory:e4000000-e4012100
eth0:2    Link encap:Ethernet  HWaddr 00:14:5E:F4:DE:FA  
          inet addr:192.168.1.24  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          Interrupt:185 Memory:e4000000-e4012100
eth1      Link encap:Ethernet  HWaddr 00:14:5E:F4:DE:FC  
          inet addr:10.10.10.2  Bcast:10.10.10.255  Mask:255.255.255.0
          inet6 addr: fe80::214:5eff:fef4:defc/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:110423 errors:0 dropped:110272 overruns:0 frame:0
          TX packets:1177 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:15851978 (15.1 Mb)  TX bytes:75600 (73.8 Kb)
          Interrupt:185 Memory:e2000000-e2012100

回复 只看该作者 道具 举报

12#
发表于 2012-2-1 17:21:19
1. 日志不完整
2. 从具体的那个时间点开始 1节点crs无法启动的?
3. 用diagcollection.pl 收集日志zip包, 虽然可能部分日志已被删除

回复 只看该作者 道具 举报

13#
发表于 2012-2-6 15:22:46
不好意思,今天才连上数据库,前几天不让连接。
服务器down掉时间是1月24日,附件为diagcollection.pl 收集的两个节点的信息和系统日志。

log.zip

459.21 KB, 下载次数: 1402

回复 只看该作者 道具 举报

14#
发表于 2012-2-6 15:41:49
日志分析

RAC1:

2012-01-31 18:36:15.612
[cssd(6838)]CRS-1605:CSSD voting file is online: /dev/raw/raw4. Details in /oracle_crs/product/10.2/crs/log/rac1/cssd/ocssd.log.
2012-01-31 18:36:15.612
[cssd(6838)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /oracle_crs/product/10.2/crs/log/rac1/cssd/ocssd.log.
2012-01-31 18:36:15.612
[cssd(6838)]CRS-1604:CSSD voting file is offline: /dev/raw/raw5. Details in /oracle_crs/product/10.2/crs/log/rac1/cssd/ocssd.log.
2012-01-31 18:37:29.292
[cssd(6838)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /oracle_crs/product/10.2/crs/log/rac1/cssd/ocssd.log.
2012-02-01 10:55:48.486
[cssd(6966)]CRS-1605:CSSD voting file is online: /dev/raw/raw3. Details in /oracle_crs/product/10.2/crs/log/rac1/cssd/ocssd.log.
2012-02-01 10:55:48.494
[cssd(6966)]CRS-1605:CSSD voting file is online: /dev/raw/raw4. Details in /oracle_crs/product/10.2/crs/log/rac1/cssd/ocssd.log.
2012-02-01 10:55:48.503
[cssd(6966)]CRS-1605:CSSD voting file is online: /dev/raw/raw5. Details in /oracle_crs/product/10.2/crs/log/rac1/cssd/ocssd.log.



RAC2:

[    CSSD]2012-01-24 08:03:17.381 [2909830048] >TRACE:   clssgmChangeMasterNode: requeued 0 RPCs
[    CSSD]2012-01-24 08:03:17.381 [2909830048] >TRACE:   clssgmMasterCMSync: Synchronizing group/lock status
[    CSSD]2012-01-24 08:03:17.381 [2909830048] >TRACE:   clssgmMasterSendDBDone: group/lock status synchronization complete
[    CSSD]CLSS-3000: reconfiguration successful, incarnation 1 with 1 nodes

[    CSSD]CLSS-3001: local node number 2, master node number 2

[    CSSD]2012-01-24 08:03:17.381 [2909830048] >TRACE:   clssgmReconfigThread:  completed for reconfig(1), with status(1)
[    CSSD]2012-01-24 08:03:17.443 [2968849312] >TRACE:   clssgmCommonAddMember: clsomon joined (2/0x1000000/#CSS_CLSSOMON)


RAC 1的日志到 2月1日为止,  RAC2的日志得到 1月24日为止

之前在1月24日出现过一次rac1因 网络心跳超时 而被 驱逐:

[    CSSD]2012-01-24 07:43:36.314 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 50 8.430518e-270artbeat fatal, eviction in 109.580 seconds
[    CSSD]2012-01-24 07:43:36.314 [2934672288] >TRACE:   clssnmPollingThread: node rac1 (1) is impending reconfig, flag 1037, misstime 110420
[    CSSD]2012-01-24 07:43:37.318 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 50 8.430518e-270artbeat fatal, eviction in 108.580 seconds
[    CSSD]2012-01-24 07:44:31.539 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 75 8.430518e-270artbeat fatal, eviction in 54.350 seconds
[    CSSD]2012-01-24 07:45:04.675 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 90 8.430518e-270artbeat fatal, eviction in 21.220 seconds
[    CSSD]2012-01-24 07:45:05.679 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 90 8.430518e-270artbeat fatal, eviction in 20.210 seconds
[    CSSD]2012-01-24 07:45:06.683 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 90 8.430518e-270artbeat fatal, eviction in 19.210 seconds
[    CSSD]2012-01-24 07:45:07.687 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 90 8.430518e-270artbeat fatal, eviction in 18.210 seconds
[    CSSD]2012-01-24 07:45:08.691 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 90 8.430518e-270artbeat fatal, eviction in 17.200 seconds
[    CSSD]2012-01-24 07:45:09.695 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 90 8.430518e-270artbeat fatal, eviction in 16.200 seconds
[    CSSD]2012-01-24 07:45:10.699 [2934672288] >WARNING: clssnmPollingThread: node rac1 (1) at 90 8.430518e-270artbeat fatal, eviction in 15.190 seconds



但是之后rac1 重启后 CRS没有正常启动, 建议:

1. 尝试运行 crsctl start crs
2. 观察RAC1的日志是否有变化
3. 重启RAC1 主机OS
4.
cluvfy stage -post hwos -n all
cluvfy stage -post crsinst -n all

检查硬件和 CRS安装

回复 只看该作者 道具 举报

15#
发表于 2012-2-6 16:07:24
1:运行crsctl start crs,alert日志无变化。
(1月24日服务器直接down掉,所以被驱逐)

2:服务器已经重启试过,还是一样的情况。

3:
RAC1:
[email=oracle@RAC1]oracle@RAC1[/email]:~> cluvfy stage -post hwos -n all
Performing post-checks for hardware and operating system setup
Checking node reachability...
Node reachability check passed from node "RAC1".

Checking user equivalence...
User equivalence check passed for user "oracle".
Checking node connectivity...
Node connectivity check passed for subnet "192.168.1.0" with node(s) rac2,rac1.
Node connectivity check passed for subnet "10.10.10.0" with node(s) rac2,rac1.
Suitable interfaces for the private interconnect on subnet "192.168.1.0":
rac2 eth0:192.168.1.25 eth0:192.168.1.26 eth0:192.168.1.24
rac1 eth0:192.168.1.23
Suitable interfaces for the private interconnect on subnet "10.10.10.0":
rac2 eth1:10.10.10.2
rac1 eth1:10.10.10.1
ERROR:
Could not find a suitable set of interfaces for VIPs.
Node connectivity check failed.

Checking shared storage accessibility...
WARNING:
Package cvuqdisk not installed.
        rac2,rac1

Shared storage check failed on nodes "rac2,rac1".
Post-check for hardware and operating system setup was unsuccessful on all the nodes.

RAC2:

[email=oracle@RAC2]oracle@RAC2[/email]:~> cluvfy stage -post hwos -n all
Performing post-checks for hardware and operating system setup
Checking node reachability...
Node reachability check passed from node "RAC2".

Checking user equivalence...
User equivalence check passed for user "oracle".
Checking node connectivity...
Node connectivity check passed for subnet "192.168.1.0" with node(s) rac2,rac1.
Node connectivity check passed for subnet "10.10.10.0" with node(s) rac2,rac1.
Suitable interfaces for the private interconnect on subnet "192.168.1.0":
rac2 eth0:192.168.1.25 eth0:192.168.1.26 eth0:192.168.1.24
rac1 eth0:192.168.1.23
Suitable interfaces for the private interconnect on subnet "10.10.10.0":
rac2 eth1:10.10.10.2
rac1 eth1:10.10.10.1
ERROR:
Could not find a suitable set of interfaces for VIPs.
Node connectivity check failed.

Checking shared storage accessibility...
WARNING:
Package cvuqdisk not installed.
        rac2,rac1

Shared storage check failed on nodes "rac2,rac1".
Post-check for hardware and operating system setup was unsuccessful on all the nodes.


RAC1:
[email=oracle@RAC1]oracle@RAC1[/email]:~> cluvfy stage -post crsinst -n all
Performing post-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "RAC1".

Checking user equivalence...
User equivalence check passed for user "oracle".
Checking Cluster manager integrity...

Checking CSS daemon...
Daemon status check passed for "CSS daemon".
Cluster manager integrity check passed.
Checking cluster integrity...

Cluster integrity check passed

Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check passed.
Checking CRS integrity...
Checking daemon liveness...
Liveness check passed for "CRS daemon".
Checking daemon liveness...
Liveness check passed for "CSS daemon".
Checking daemon liveness...
Liveness check passed for "EVM daemon".
Checking CRS health...
CRS health check failed.
Check failed on nodes:
        rac1
CRS integrity check failed.
Checking node application existence...

Checking existence of VIP node application (required)
Check passed.
Checking existence of ONS node application (optional)
Check passed.
Checking existence of GSD node application (optional)
Check passed.

Post-check for cluster services setup was unsuccessful.
Checks did not pass for the following node(s):
        rac1

RAC2:
[email=oracle@RAC2]oracle@RAC2[/email]:~> cluvfy stage -post crsinst -n all
Performing post-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "RAC2".

Checking user equivalence...
User equivalence check passed for user "oracle".
Checking Cluster manager integrity...

Checking CSS daemon...
Daemon status check passed for "CSS daemon".
Cluster manager integrity check passed.
Checking cluster integrity...

Cluster integrity check passed

Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations.
Uniqueness check for OCR device passed.
Checking the version of OCR...
OCR of correct Version "2" exists.
Checking data integrity of OCR...
Data integrity check for OCR passed.
OCR integrity check passed.
Checking CRS integrity...
Checking daemon liveness...
Liveness check passed for "CRS daemon".
Checking daemon liveness...
Liveness check passed for "CSS daemon".
Checking daemon liveness...
Liveness check passed for "EVM daemon".
Checking CRS health...
CRS health check failed.
Check failed on nodes:
        rac1
CRS integrity check failed.
Checking node application existence...

Checking existence of VIP node application (required)
Check passed.
Checking existence of ONS node application (optional)
Check passed.
Checking existence of GSD node application (optional)
Check passed.

Post-check for cluster services setup was unsuccessful.
Checks did not pass for the following node(s):
        rac1

回复 只看该作者 道具 举报

16#
发表于 2012-2-6 16:31:02
:)  看起来css是太懒了不想动....

action plan:

debug css:



debug css生成更多有用的信息

node1:

<CRS_HOME>/bin/crsctl debug log css CSSD:4

crsctl stop crs
crsctl start crs

等待10分钟 后 把cssd目录下的日志打包压缩上传

回复 只看该作者 道具 举报

17#
发表于 2012-2-6 17:32:36
节点1上面执行命令不成功

RAC1:/oracle_crs/product/10.2/crs/bin # ./crsctl debug log css CSSD:4
Failure 32 in main OCR context initialization: PROC-32: Cluster Ready Services on the local node is not running Messaging error [9]
Failed to set CSS trace level to OCR

回复 只看该作者 道具 举报

18#
发表于 2012-2-6 18:28:23
Action plan :


1.在所有节点上执行


    olsnodes
    crsctl check css
    crsctl check crs status
    crsctl check crsd status
    crsctl check cssd status
    crsctl check evmd status
    crsctl query css votedisk
   crsctl query crs softwareversion
   crsctl query crs activeversion
   
   cluvfy comp software -verbose n all
   
   
  上传 1节点的 /var/log/messages
  
  
2. 关于debug
  
  在<CRS_HOME>/bin/ocssd 这个文件的开头 加入;
  
CLSC_TRACE_LVL=5
export CLSC_TRACE_LVL
CLSC_NSTRACE_LVL=12
export CLSC_NSTRACE_LVL

在尝试如下命令:
crsctl stop crs
crsctl disable crs
crsctl enable  crs
crsctl start crs

回复 只看该作者 道具 举报

19#
发表于 2012-2-7 10:07:08
1:
RAC1:/oracle_crs/product/10.2/crs/bin # ./olsnodes
rac1
rac2
RAC1:/oracle_crs/product/10.2/crs/bin # ./crsctl check css
RAC1:/oracle_crs/product/10.2/crs/bin # ./crsctl check crs status                    ---->上面几个命令在执行都执行不成功,一直卡在那里,也不提示错误信息

RAC1:/oracle_crs/product/10.2/crs/bin # ./crsctl query css votedisk
0.     0    /dev/raw/raw3
1.     0    /dev/raw/raw4
2.     0    /dev/raw/raw5
located 3 votedisk(s).
RAC1:/oracle_crs/product/10.2/crs/bin # ./crsctl query crs softwareversion
CRS software version on node [rac1] is [10.2.0.4.0]
RAC1:/oracle_crs/product/10.2/crs/bin # ./crsctl query crs activeversion
CRS active version on the cluster is [10.2.0.4.0]
[email=oracle@RAC1]oracle@RAC1[/email]:~> cluvfy comp software -verbose -n all
Verifying cluster file system
Check: Software
ERROR:
/oracle_crs/product/10.2/crs/cv/cvdata/software_conf.xml (No such file or directory)
Software check failed.
Verification of cluster file system was unsuccessful on all the nodes.


RAC2:/oracle_crs/product/10.2/crs/bin # ./olsnodes
rac1
rac2
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl check css
CSS appears healthy
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl check crs status
CSS appears healthy
CRS appears healthy
EVM appears healthy
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctlcheck crsd status
-bash: ./crsctlcheck: No such file or directory
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl check crsd status
CRS appears healthy
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl check cssd status
CSS appears healthy
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl check evmd status
EVM appears healthy
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl query css votedisk
0.     0    /dev/raw/raw3
1.     0    /dev/raw/raw4
2.     0    /dev/raw/raw5
located 3 votedisk(s).
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl query crs softwareversion
CRS software version on node [rac2] is [10.2.0.4.0]
RAC2:/oracle_crs/product/10.2/crs/bin # ./crsctl query crs activeversion
CRS active version on the cluster is [10.2.0.4.0]
[email=oracle@RAC2]oracle@RAC2[/email]:~> cluvfy comp software -verbose -n all
Verifying cluster file system
Check: Software
ERROR:
/oracle_crs/product/10.2/crs/cv/cvdata/software_conf.xml (No such file or directory)              -------》find了下,没找到该文件
Software check failed.
Verification of cluster file system was unsuccessful on all the nodes.

附件为cssd目录日志及系统日志

log.zip

1.2 MB, 下载次数: 1373

回复 只看该作者 道具 举报

20#
发表于 2012-2-7 10:15:01
oracle@RAC2:~> date;ssh rac1 date
Tue Feb  7 10:20:23 CST 2012
Tue Feb  7 10:20:25 CST 2012
oracle@RAC2:~> date;ssh rac1-priv date
Tue Feb  7 10:20:35 CST 2012
Tue Feb  7 10:20:37 CST 2012

回复 只看该作者 道具 举报

21#
发表于 2012-2-7 16:46:21
同之前看到的日志不同, RAC1 有大量的

[    CSSD]2012-02-07 09:58:22.465 [2917886880] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[    CSSD]2012-02-07 09:58:22.881 [3036670880] >TRACE:   clssnmReadDskHeartbeat: node(2) is down. rcfg(2) wrtcnt(1210560) LATS(68005176) Disk lastSeqNo(1210560)
[

说明磁盘心跳存在问题, 因为你是使用raw 绑的设备 建议你 确认raw设备没有因为 设备名变化而 不对号。

回复 只看该作者 道具 举报

22#
发表于 2012-2-7 16:49:00
如果你是使用 /dev/sd*这样的块设备 绑定为raw的话 ,利用下面的命令可以列出设备的scsi_id 以便对比

  1. for i in b c d e f g h i j k ;
  2. do
  3. echo "sd$i RESULT==\"`scsi_id -g -u -s /block/sd$i`"
  4. done
复制代码

回复 只看该作者 道具 举报

23#
发表于 2012-2-8 00:34:40
谢谢版主,,已经解决。
停掉节点2的crs,然后启动节点1的crs,启动起来后在启动节点2的crs。。都启动起来
RAC2:/oracle_crs/product/10.2/crs/bin # ./crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.orcl.db    application    ONLINE    ONLINE    rac1        
ora....l1.inst application    ONLINE    ONLINE    rac1        
ora....l2.inst application    ONLINE    ONLINE    rac2        
ora....SM1.asm application    ONLINE    ONLINE    rac1        
ora....C1.lsnr application    ONLINE    ONLINE    rac1        
ora.rac1.gsd   application    ONLINE    ONLINE    rac1        
ora.rac1.ons   application    ONLINE    ONLINE    rac1        
ora.rac1.vip   application    ONLINE    ONLINE    rac1        
ora....SM2.asm application    ONLINE    ONLINE    rac2        
ora....C2.lsnr application    ONLINE    ONLINE    rac2        
ora.rac2.gsd   application    ONLINE    ONLINE    rac2        
ora.rac2.ons   application    ONLINE    ONLINE    rac2        
ora.rac2.vip   application    ONLINE    ONLINE    rac2

有疑问了:为什么节点2启动后,,死活节点1都启动不起来?已经遇到两套RAC这样的问题了。从日志来看我找不出那里有明显的提示。版主是否遇到过这样的为难题,或者有好的思路。能否给一些!!!谢谢!!!

回复 只看该作者 道具 举报

24#
发表于 2012-2-8 15:25:19
1. 日志比较凌乱, 这类问题还是需要将问题的时间线(timeline)整理出来, 但是在此例中做不到。 无法理清root cause 和当前的状态

2. RAC1 一直试图读取votedisk  上RAC2的心跳 但是始终没有

3.  强烈不建议用rawbanding设备, 推荐用udev 或者asmlib

回复 只看该作者 道具 举报

25#
发表于 2012-2-8 15:39:31
再次谢谢你的帮组

回复 只看该作者 道具 举报

您需要登录后才可以回帖 登录 | 注册

QQ|手机版|Archiver|Oracle数据库数据恢复、性能优化

GMT+8, 2024-11-15 01:44 , Processed in 0.061321 second(s), 24 queries .

Powered by Discuz! X2.5

© 2001-2012 Comsenz Inc.

回顶部
TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569